Lesswrong Sequences Volume 1.0

LessWrong.
com Sequences
Llizier Yudkowsky
Generated by lesswrong_book.py on 2013-01-05.
Pseudo-random version string: ee78df8a-6178-4d00-8a06-16a9ed81a6be.
A collection of posts dealing with the fundamentals of
rationality: the difference between the map and the
territory, Bayess Theorem and the nature of evidence,
why anyone should care about truth, and minds as
reflective cognitive engines.
Part I
Map and Territory
1. The Simple Truth
I remember this paper I wrote on existentialism. My

teacher gave it back with an F. She'd underlined true and
truth wherever it appeared in the essay, probably about
twenty times, with a question mark beside each. She
wanted to know what I meant by truth."
Lanielle Lgan (journalist)
Authors Foreword:
This essay is meant to restore a naive view oI truth.
Someone says to you: My miracle snake oil can rid you oI lung
cancer in just three weeks." You reply: Lidn't a clinical study show
this claim to be untrue:" The one returns: This notion oI 'truth' is
quite naive, what do you mean by 'true':"
Many people, so questioned, don't know how to answer in
exquisitely rigorous detail. onetheless they would not be wise to
abandon the concept oI 'truth'. There was a time when no one knew
the equations oI gravity in exquisitely rigorous detail, yet iI you
walked oII a cliII, you would Iall.
OIten I have seen - especially on Internet mailing lists - that
amidst other conversation, someone says X is true", and then an
argument breaks out over the use oI the word 'true'. This essay is
not meant as an encyclopedic reIerence Ior that argument. Rather, I
hope the arguers will read this essay, and then go back to whatever
they were discussing beIore someone questioned the nature oI
truth.
In this essay I pose questions. II you see what seems like a
really obvious answer, it's probably the answer I intend. The obvi
ous choice isn't always the best choice, but sometimes, by golly, it
is. I don't stop looking as soon I Iind an obvious answer, but iI I go
on looking, and the obviousseeming answer still seems obvious, I
don't Ieel guilty about keeping it. Oh, sure, everyone thinks two plus
two is Iour, everyone says two plus two is Iour, and in the mere mun
dane drudgery oI everyday liIe everyone behaves as iI two plus two
is Iour, but what does two plus two really, ultimately equal: As near
as I can Iigure, Iour. It's still Iour even iI I intone the question in a
. http://yudkowsky.net/rational/the-simple-truth
solemn, portentous tone oI voice. Too simple, you say: Maybe, on
this occasion, liIe doesn't need to be complicated. \ouldn't that be
reIreshing:
II you are one oI those Iortunate Iolk to whom the question
seems trivial at the outset, I hope it still seems trivial at the Iinish.
II you Iind yourselI stumped by deep and meaningIul questions, re
member that iI you know exactly how a system works, and could
build one yourselI out oI buckets and pebbles, it should not be a
mystery to you.
II conIusion threatens when you interpret a metaphor as a
metaphor, try taking everything completely literally.
Imagine that in an era beIore recorded history or Iormal math
ematics, I am a shepherd and I have trouble tracking my sheep. My
sheep sleep in an enclosure, a Iold, and the enclosure is high enough
to guard my sheep Irom wolves that roam by night. Lach day I must
release my sheep Irom the Iold to pasture and graze, each night I
must Iind my sheep and return them to the Iold. II a sheep is leIt
outside, I will Iind its body the next morning, killed and halIeat
en by wolves. But it is so discouraging, to scour the Iields Ior hours,
looking Ior one last sheep, when I know that probably all the sheep
are in the Iold. Sometimes I give up early, and usually I get away
with it, but around a tenth oI the time there is a dead sheep the
next morning.
II only there were some way to divine whether sheep are still
grazing, without the inconvenience oI looking! I try several meth
ods: I toss the divination sticks oI my tribe, I train my psychic
powers to locate sheep through clairvoyance, I search careIully Ior
reasons to believe all the sheep are in the Iold. It makes no diIIer
ence. Around a tenth oI the times I turn in early, I Iind a dead sheep
the next morning. Perhaps I realize that my methods aren't work
ing, and perhaps I careIully excuse each Iailure, but my dilemma is
still the same. I can spend an hour searching every possible nook
and cranny, when most oI the time there are no remaining sheep,
or I can go to sleep early and lose, on the average, onetenth oI a
sheep.
Late one aIternoon I Ieel especially tired. I toss the divination
sticks and the divination sticks say that all the sheep have returned.
I visualize each nook and cranny, and I don't imagine scrying any
6 MAP AND TERRITORY
sheep. I'm still not conIident enough, so I look inside the Iold and
it seems like there are a lot oI sheep, and I review my earlier eIIorts
and decide that I was especially diligent. This dissipates my anxi
ety, and I go to sleep. The next morning I discover two dead sheep.
Something inside me snaps, and I begin thinking creatively.
That day, loud hammering noises come Irom the gate oI the
sheepIold's enclosure.
The next morning, I open the gate oI the enclosure only a little
way, and as each sheep passes out oI the enclosure, I drop a peb
ble into a bucket nailed up next to the door. In the aIternoon, as
each returning sheep passes by, I take one pebble out oI the bucket.
\hen there are no pebbles leIt in the bucket, I can stop searching
and turn in Ior the night. It is a brilliant notion. It will revolutionize
shepherding.
That was the theory. In practice, it took considerable reIine
ment beIore the method worked reliably. Several times I searched
Ior hours and didn't Iind any sheep, and the next morning there
were no stragglers. On each oI these occasions it required deep
thought to Iigure out where my bucket system had Iailed. On re
turning Irom one Iruitless search, I thought back and realized that
the bucket already contained pebbles when I started, this, it turned
out, was a bad idea. Another time I randomly tossed pebbles into
the bucket, to amuse myselI, between the morning and the aIter
noon, this too was a bad idea, as I realized aIter searching Ior a
Iew hours. But I practiced my pebblecraIt, and became a reasonably
proIicient pebblecraIter.
One aIternoon, a man richly attired in white robes, leaIy laurels,
sandals, and business suit trudges in along the sandy trail that leads
to my pastures.
Can I help you:" I inquire.
The man takes a badge Irom his coat and Ilips it open, proving
beyond the shadow oI a doubt that he is Markos Sophisticus Max
imus, a delegate Irom the Senate oI Rum. (One might wonder
whether another could steal the badge, but so great is the power oI
these badges that iI any other were to use them, they would in that
instant be transformed into Markos.)
THE SIMPLE TRUTH 7
Call me Mark," he says. I'm here to conIiscate the magic peb
bles, in the name oI the Senate, artiIacts oI such great power must
not Iall into ignorant hands."
That bleedin' apprentice," I grouse under my breath, he's been
yakkin' to the villagers again." Then I look at Mark's stern Iace, and
sigh. They aren't magic pebbles," I say aloud. ]ust ordinary stones
I picked up Irom the ground."
A Ilicker oI conIusion crosses Mark's Iace, then he brightens
again. I'm here Ior the magic bucket!" he declares.
It's not a magic bucket," I say wearily. I used to keep dirty
socks in it."
Mark's Iace is puzzled. Then where is the magic:" he demands.
An interesting question. It's hard to explain," I say.
My current apprentice, Autrey, attracted by the commotion,
wanders over and volunteers his explanation: It's the level oI peb
bles in the bucket," Autrey says. There's a magic level oI pebbles,
and you have to get the level just right, or it doesn't work. II you
throw in more pebbles, or take some out, the bucket won't be at the
magic level anymore. Right now, the magic level is," Autrey peers
into the bucket, about onethird Iull."
I see!" Mark says excitedly. From his back pocket Mark takes
out his own bucket, and a heap oI pebbles. Then he grabs a Iew
handIuls oI pebbles, and stuIIs them into the bucket. Then Mark
looks into the bucket, noting how many pebbles are there. There
we go," Mark says, the magic level oI this bucket is halI Iull. Like
that:"
o!" Autrey says sharply. HalI Iull is not the magic level. The
magic level is about onethird. HalI Iull is deIinitely unmagic. Fur
thermore, you're using the wrong bucket."
Mark turns to me, puzzled. I thought you said the bucket
wasn't magic:"
It's not," I say. A sheep passes out through the gate, and I toss
another pebble into the bucket. Besides, I'm watching the sheep.
Talk to Autrey."
Mark dubiously eyes the pebble I tossed in, but decides to tem
porarily shelve the question. Mark turns to Autrey and draws him
selI up haughtily. It's a Iree country," Mark says, under the benev
8 MAP AND TERRITORY
olent dictatorship oI the Senate, oI course. I can drop whichever
pebbles I like into whatever bucket I like."
Autrey considers this. o you can't," he says Iinally, there
won't be any magic."
Look," says Mark patiently, I watched you careIully. You
looked in your bucket, checked the level oI pebbles, and called that
the magic level. I did exactly the same thing."
That's not how it works," says Autrey.
Oh, I see," says Mark, It's not the level oI pebbles in my buck
et that's magic, it's the level oI pebbles in your bucket. Is that what
you claim: \hat makes your bucket so much better than mine,
huh:"
\ell," says Autrey, iI we were to empty your bucket, and then
pour all the pebbles Irom my bucket into your bucket, then your
bucket would have the magic level. There's also a procedure we can
use to check iI your bucket has the magic level, iI we know that my
bucket has the magic level, we call that a bucket compare opera
tion."
Another sheep passes, and I toss in another pebble.
He just tossed in another pebble!" Mark says. And I suppose
you claim the new level is also magic: I could toss pebbles into your
bucket until the level was the same as mine, and then our buck
ets would agree. You're just comparing my bucket to your bucket
to determine whether you think the level is 'magic' or not. \ell, I
think your bucket isn't magic, because it doesn't have the same level
oI pebbles as mine. So there!"
\ait," says Autrey, you don't understand "
By 'magic level', you mean simply the level oI pebbles in your
own bucket. And when I say 'magic level', I mean the level oI peb
bles in my bucket. Thus you look at my bucket and say it 'isn't
magic', but the word 'magic' means diIIerent things to diIIerent
people. You need to speciIy whose magic it is. You should say that
my bucket doesn't have 'Autrey's magic level', and I say that your
bucket doesn't have 'Mark's magic level'. That way, the apparent
contradiction goes away."
But " says Autrey helplessly.
THE SIMPLE TRUTH 9
LiIIerent people can have diIIerent buckets with diIIerent lev
els oI pebbles, which proves this business about 'magic' is complete
ly arbitrary and subjective."
Mark," I say, did anyone tell you what these pebbles do?"
Do?" says Mark. I thought they were just magic."
II the pebbles didn't do anything," says Autrey, our ISO jooo
process eIIiciency auditor would eliminate the procedure Irom our
daily work."
\hat's your auditor's name:"
Larwin," says Autrey.
Hm," says Mark. Charles does have a reputation as a strict au
ditor. So do the pebbles bless the Ilocks, and cause the increase oI
sheep:"
o," I say. The virtue oI the pebbles is this, iI we look into the
bucket and see the bucket is empty oI pebbles, we know the pas
tures are likewise empty oI sheep. II we do not use the bucket, we
must search and search until dark, lest one last sheep remain. Or iI
we stop our work early, then sometimes the next morning we Iind a
dead sheep, Ior the wolves savage any sheep leIt outside. II we look
in the bucket, we know when all the sheep are home, and we can
retire without Iear."
Mark considers this. That sounds rather implausible," he says
eventually. Lid you consider using divination sticks: Livination
sticks are inIallible, or at least, anyone who says they are Iallible is
burned at the stake. This is an extremely painIul way to die, it Iol
lows that divination sticks are inIallible."
You're welcome to use divination sticks iI you like," I say.
Oh, good heavens, oI course not," says Mark. They work in
Iallibly, with absolute perIection on every occasion, as beIits such
blessed instruments, but what iI there were a dead sheep the next
morning: I only use the divination sticks when there is no possibil
ity oI their being proven wrong. Otherwise I might be burned alive.
So how does your magic bucket work:"
How does the bucket work.: I'd better start with the simplest
possible case. \ell," I say, suppose the pastures are empty, and
the bucket isn't empty. Then we'll waste hours looking Ior a sheep
that isn't there. And iI there are sheep in the pastures, but the buck
et is empty, then Autrey and I will turn in too early, and we'll Iind
10 MAP AND TERRITORY
dead sheep the next morning. So an empty bucket is magical iI and
only iI the pastures are empty "
Hold on," says Autrey. That sounds like a vacuous tautology
to me. Aren't an empty bucket and empty pastures obviously the
same thing:"
It's not vacuous," I say. Here's an analogy: The logician AlIred
Tarski once said that the assertion 'Snow is white' is true iI and only
iI snow is white. II you can understand that, you should be able to
see why an empty bucket is magical iI and only iI the pastures are
empty oI sheep."
Hold on," says Mark. These are buckets. They don't have any
thing to do with sheep. Buckets and sheep are obviously completely
diIIerent. There's no way the sheep can ever interact with the buck
et."
Then where do you think the magic comes Irom:" inquires
Autrey.
Mark considers. You said you could compare two buckets to
check iI they had the same level. I can see how buckets can inter
act with buckets. Maybe when you get a large collection oI buckets,
and they all have the same level, thats what generates the magic. I'll
call that the coherentist theory oI magic buckets."
Interesting," says Autrey. I know that my master is working
on a system with multiple buckets - he says it might work better
because oI 'redundancy' and 'error correction'. That sounds like co
herentism to me."
They're not quite the same " I start to say.
Let's test the coherentism theory oI magic," says Autrey. I can
see you've got Iive more buckets in your back pocket. I'll hand you
the bucket we're using, and then you can Iill up your other buckets
to the same level "
Mark recoils in horror. Stop! These buckets have been passed
down in my Iamily Ior generations, and they've always had the same
level! II I accept your bucket, my bucket collection will become less
coherent, and the magic will go away!"
But your current buckets don't have anything to do with the
sheep!" protests Autrey.
THE SIMPLE TRUTH 11
Mark looks exasperated. Look, I've explained beIore, there's
obviously no way that sheep can interact with buckets. Buckets can
only interact with other buckets."
I toss in a pebble whenever a sheep passes," I point out.
\hen a sheep passes, you toss in a pebble:" Mark says. \hat
does that have to do with anything:"
It's an interaction between the sheep and the pebbles," I reply.
o, it's an interaction between the pebbles and you," Mark
says. The magic doesn't come Irom the sheep, it comes Irom you.
Mere sheep are obviously nonmagical. The magic has to come Irom
somewhere, on the way to the bucket."
I point at a wooden mechanism perched on the gate. Lo you
see that Ilap oI cloth hanging down Irom that wooden contraption:
\e're still Iiddling with that - it doesn't work reliably - but when
sheep pass through, they disturb the cloth. \hen the cloth moves
aside, a pebble drops out oI a reservoir and Ialls into the bucket.
That way, Autrey and I won't have to toss in the pebbles ourselves."
Mark Iurrows his brow. I don't quite Iollow you. is the cloth
magical:"
I shrug. I ordered it online Irom a company called atural Se
lections. The Iabric is called Sensory Modality." I pause, seeing the
incredulous expressions oI Mark and Autrey. I admit the names
are a bit ew Agey. The point is that a passing sheep triggers a
chain oI cause and eIIect that ends with a pebble in the bucket. Af-
terward you can compare the bucket to other buckets, and so on."
I still don't get it," Mark says. You can't Iit a sheep into a
bucket. Only pebbles go in buckets, and it's obvious that pebbles
only interact with other pebbles."
The sheep interact with things that interact with pebbles." I
search Ior an analogy. Suppose you look down at your shoelaces.
A photon leaves the Sun, then travels down through Larth's at
mosphere, then bounces oII your shoelaces, then passes through
the pupil oI your eye, then strikes the retina, then is absorbed by
a rod or a cone. The photon's energy makes the attached neuron
Iire, which causes other neurons to Iire. A neural activation pattern
in your visual cortex can interact with your belieIs about your
shoelaces, since belieIs about shoelaces also exist in neural sub
strate. II you can understand that, you should be able to see how a
passing sheep causes a pebble to enter the bucket."
At exactly which point in the process does the pebble become
magic:" says Mark.
It. um." ow Im starting to get conIused. I shake my head
to clear away cobwebs. This all seemed simple enough when I woke
up this morning, and the pebbleandbucket system hasn't gotten
any more complicated since then. This is a lot easier to understand
iI you remember that the point oI the system is to keep track oI
sheep."
Mark sighs sadly. ever mind. it's obvious you don't know.
Maybe all pebbles are magical to start with, even beIore they enter
the bucket. \e could call that position panpebblism."
Ha!" Autrey says, scorn rich in his voice. Mere wishIul think
ing! ot all pebbles are created equal. The pebbles in your bucket
are not magical. They're only lumps oI stone!"
Mark's Iace turns stern. ow," he cries, now you see the dan
ger oI the road you walk! Once you say that some people's pebbles
are magical and some are not, your pride will consume you! You will
think yourselI superior to all others, and so Iall! Many throughout
history have tortured and murdered because they thought their own
pebbles supreme!" A tinge oI condescension enters Mark's voice.
\orshipping a level oI pebbles as 'magical' implies that there's an
absolute pebble level in a Supreme Bucket. obody believes in a
Supreme Bucket these days."
One," I say. Sheep are not absolute pebbles. Two, I don't
think my bucket actually contains the sheep. Three, I don't worship
my bucket level as perIect - I adjust it sometimes - and I do that
because I care about the sheep."
Besides," says Autrey, someone who believes that possessing
absolute pebbles would license torture and murder, is making a mis
take that has nothing to do with buckets. You're solving the wrong
problem."
Mark calms himselI down. I suppose I can't expect any better
Irom mere shepherds. You probably believe that snow is white,
don't you."
\m. yes:" says Autrey.
THE SIMPLE TRUTH 13
It doesn't bother you that Joseph Stalin believed that snow is
white:"
\m. no:" says Autrey.
Mark gazes incredulously at Autrey, and Iinally shrugs. Let's
suppose, purely Ior the sake oI argument, that your pebbles are
magical and mine aren't. Can you tell me what the diIIerence is:"
My pebbles represent the sheep!" Autrey says triumphantly.
Your pebbles don't have the representativeness property, so they
won't work. They are empty oI meaning. ]ust look at them. There's
no aura oI semantic content, they are merely pebbles. You need a
bucket with special causal powers."
Ah!" Mark says. Special causal powers, instead oI magic."
Lxactly," says Autrey. I'm not superstitious. Postulating mag
ic, in this day and age, would be unacceptable to the international
shepherding community. \e have Iound that postulating magic
simply doesn't work as an explanation Ior shepherding phenomena.
So when I see something I don't understand, and I want to explain
it using a model with no internal detail that makes no predictions
even in retrospect, I postulate special causal powers. II that doesn't
work, I'll move on to calling it an emergent phenomenon."
\hat kind oI special powers does the bucket have:" asks Mark.
Hm," says Autrey. Maybe this bucket is imbued with an about-
ness relation to the pastures. That would explain why it worked -
when the bucket is empty, it means the pastures are empty."
\here did you Iind this bucket:" says Mark. And how did you
realize it had an aboutness relation to the pastures:"
It's an ordinary bucket," I say. I used to climb trees with it. I
don't think this question needs to be diIIicult."
I'm talking to Autrey," says Mark.
You have to bind the bucket to the pastures, and the pebbles
to the sheep, using a magical ritual - pardon me, an emergent
process with special causal powers - that my master discovered,"
Autrey explains.
Autrey then attempts to describe the ritual, with Mark nodding
along in sage comprehension.
You have to throw in a pebble every time a sheep leaves
through the gate:" says Mark. Take out a pebble every time a sheep
returns:"
Autrey nods. Yeah."
That must be really hard," Mark says sympathetically.
Autrey brightens, soaking up Mark's sympathy like rain. Lxact
ly!" says Autrey. It's extremely hard on your emotions. \hen the
bucket has held its level Ior a while, you. tend to get attached to
that level."
A sheep passes then, leaving through the gate. Autrey sees, he
stoops, picks up a pebble, holds it aloIt in the air. Behold!" Autrey
proclaims. A sheep has passed! I must now toss a pebble into this
bucket, my dear bucket, and destroy that Iond level which has held
Ior so long - " Another sheep passes. Autrey, caught up in his dra
ma, misses it, so I plunk a pebble into the bucket. Autrey is still
speaking: " - Ior that is the supreme test oI the shepherd, to throw
in the pebble, be it ever so agonizing, be the old level ever so pre
cious. Indeed, only the best oI shepherds can meet a requirement
so stern
Autrey," I say, iI you want to be a great shepherd someday,
learn to shut up and throw in the pebble. o Iuss. o drama. ]ust
do it."
And this ritual," says Mark, it binds the pebbles to the sheep
by the magical laws oI Sympathy and Contagion, like a voodoo doll."
Autrey winces and looks around. Please! Lon't call it Sympathy
and Contagion. \e shepherds are an antisuperstitious Iolk. \se
the word 'intentionality', or something like that."
Can I look at a pebble:" says Mark.
Sure," I say. I take one oI the pebbles out oI the bucket, and
toss it to Mark. Then I reach to the ground, pick up another peb
ble, and drop it into the bucket.
Autrey looks at me, puzzled. Lidn't you just mess it up:"
I shrug. I don't think so. \e'll know I messed it up iI there's a
dead sheep next morning, or iI we search Ior a Iew hours and don't
Iind any sheep."
But " Autrey says.
THE SIMPLE TRUTH 15
I taught you everything you know, but I haven't taught you ev
erything I know," I say.
Mark is examining the pebble, staring at it intently. He holds
his hand over the pebble and mutters a Iew words, then shakes his
head. I don't sense any magical power," he says. Pardon me. I
don't sense any intentionality."
A pebble only has intentionality iI it's inside a ma an emergent
bucket," says Autrey. Otherwise it's just a mere pebble."
ot a problem," I say. I take a pebble out oI the bucket, and
toss it away. Then I walk over to where Mark stands, tap his hand
holding a pebble, and say: I declare this hand to be part oI the mag
ic bucket!" Then I resume my post at the gates.
Autrey laughs. ow you're just being gratuitously evil."
I nod, Ior this is indeed the case.
Is that really going to work, though:" says Autrey.
I nod again, hoping that I'm right. I've done this beIore with
two buckets, and in principle, there should be no diIIerence be
tween Mark's hand and a bucket. Lven iI Mark's hand is imbued
with the elan vital that distinguishes live matter Irom dead matter,
the trick should work as well as iI Mark were a marble statue.
Mark is looking at his hand, a bit unnerved. So. the pebble has
intentionality again, now:"
Yep," I say. Lon't add any more pebbles to your hand, or
throw away the one you have, or you'll break the ritual."
Mark nods solemnly. Then he resumes inspecting the pebble. I
understand now how your Ilocks grew so great," Mark says. \ith
the power oI this bucket, you could keep in tossing pebbles, and the
sheep would keep returning Irom the Iields. You could start with
just a Iew sheep, let them leave, then Iill the bucket to the brim
beIore they returned. And iI tending so many sheep grew tedious,
you could let them all leave, then empty almost all the pebbles Irom
the bucket, so that only a Iew returned. increasing the Ilocks again
when it came time Ior shearing. dear heavens, man! Lo you realize
the sheer power oI this ritual you've discovered: I can only imagine
the implications, humankind might leap ahead a decade - no, a cen
tury!"
It doesn't work that way," I say. II you add a pebble when a
sheep hasn't leIt, or remove a pebble when a sheep hasn't come in,
that breaks the ritual. The power does not linger in the pebbles, but
vanishes all at once, like a soap bubble popping."
Mark's Iace is terribly disappointed. Are you sure:"
I nod. I tried that and it didn't work."
Mark sighs heavily. And this. math. seemed so powerIul and
useIul until then. Oh, well. So much Ior human progress."
Mark, it was a brilliant idea," Autrey says encouragingly. The
notion didn't occur to me, and yet it's so obvious. it would save an
enormous amount oI eIIort. there must be a way to salvage your plan!
\e could try diIIerent buckets, looking Ior one that would keep
the magical pow the intentionality in the pebbles, even without the
ritual. Or try other pebbles. Maybe our pebbles just have the wrong
properties to have inherent intentionality. \hat iI we tried it using
stones carved to resemble tiny sheep: Or just write 'sheep' on the
pebbles, that might be enough."
ot going to work," I predict dryly.
Autrey continues. Maybe we need organic pebbles, instead oI
silicon pebbles. or maybe we need to use expensive gemstones.
The price oI gemstones doubles every eighteen months, so you
could buy a handIul oI cheap gemstones now, and wait, and in twen
ty years they'd be really expensive."
You tried adding pebbles to create more sheep, and it didn't
work:" Mark asks me. \hat exactly did you do:"
I took a handIul oI dollar bills. Then I hid the dollar bills under
a Iold oI my blanket, one by one, each time I hid another bill, I
took another paperclip Irom a box, making a small heap. I was care
Iul not to keep track in my head, so that all I knew was that there
were 'many' dollar bills, and 'many' paperclips. Then when all the
bills were hidden under my blanket, I added a single additional pa
perclip to the heap, the equivalent oI tossing an extra pebble into
the bucket. Then I started taking dollar bills Irom under the Iold,
and putting the paperclips back into the box. \hen I Iinished, a
single paperclip was leIt over."
\hat does that result mean:" asks Autrey.
It means the trick didn't work. Once I broke ritual by that sin
gle misstep, the power did not linger, but vanished instantly, the
heap oI paperclips and the pile oI dollar bills no longer went empty
at the same time."
THE SIMPLE TRUTH 17
You actually tried this:" asks Mark.
Yes," I say, I actually perIormed the experiment, to veriIy that
the outcome matched my theoretical prediction. I have a sentimen
tal Iondness Ior the scientiIic method, even when it seems absurd.
Besides, what iI I'd been wrong:"
II it had worked," says Mark, you would have been guilty oI
counterIeiting! Imagine iI everyone did that, the economy would
collapse! Lveryone would have billions oI dollars oI currency, yet
there would be nothing Ior money to buy!"
ot at all," I reply. By that same logic whereby adding another
paperclip to the heap creates another dollar bill, creating another
dollar bill would create an additional dollar's worth oI goods and
services."
Mark shakes his head. CounterIeiting is still a crime. You
should not have tried."
I was reasonably conIident I would Iail."
Aha!" says Mark. You expected to Iail! You didn't believe you
could do it!"
Indeed," I admit. You have guessed my expectations with
stunning accuracy."
\ell, that's the problem," Mark says briskly. Magic is Iueled
by belieI and willpower. II you don't believe you can do it, you can't.
You need to change your belieI about the experimental result, that
will change the result itselI."
Funny," I say nostalgically, that's what Autrey said when I told
him about the pebbleandbucket method. That it was too ridicu
lous Ior him to believe, so it wouldn't work Ior him."
How did you persuade him:" inquires Mark.
I told him to shut up and Iollow instructions," I say, and when
the method worked, Autrey started believing in it."
Mark Irowns, puzzled. That makes no sense. It doesn't resolve
the essential chickenandegg dilemma."
Sure it does. The bucket method works whether or not you be
lieve in it."
That's absurd!" sputters Mark. I don't believe in magic that
works whether or not you believe in it!"
I said that too," chimes in Autrey. Apparently I was wrong."
Mark screws up his Iace in concentration. But. iI you didn't
believe in magic that works whether or not you believe in it, then
why did the bucket method work when you didn't believe in it: Lid
you believe in magic that works whether or not you believe in it
whether or not you believe in magic that works whether or not you
believe in it:"
I don't. think so." says Autrey doubtIully.
Then iI you didn't believe in magic that works whether or not
you. hold on a second, I need to work this out on paper and pen
cil " Mark scribbles Irantically, looks skeptically at the result, turns
the piece oI paper upside down, then gives up. ever mind," says
Mark. Magic is diIIicult enough Ior me to comprehend, metamagic
is out oI my depth."
Mark, I don't think you understand the art oI bucketcraIt," I
say. It's not about using pebbles to control sheep. It's about mak
ing sheep control pebbles. In this art, it is not necessary to begin
by believing the art will work. Rather, Iirst the art works, then one
comes to believe that it works."
Or so you believe," says Mark.
So I believe," I reply, because it happens to be a Iact. The
correspondence between reality and my belieIs comes Irom reality
controlling my belieIs, not the other way around."
Another sheep passes, causing me to toss in another pebble.
Ah! ow we come to the root oI the problem," says Mark.
\hat's this socalled 'reality' business: I understand what it means
Ior a hypothesis to be elegant, or IalsiIiable, or compatible with the
evidence. It sounds to me like calling a belieI 'true' or 'real' or 'ac
tual' is merely the diIIerence between saying you believe something,
and saying you really really believe something."
I pause. \ell." I say slowly. Frankly, I'm not entirely sure
myselI where this 'reality' business comes Irom. I can't create my
own reality in the lab, so I must not understand it yet. But occa
sionally I believe strongly that something is going to happen, and
then something else happens instead. I need a name Ior whatever
itis that determines my experimental results, so I call it 'reality'.
This 'reality' is somehow separate Irom even my very best hypothe
ses. Lven when I have a simple hypothesis, strongly supported by all
the evidence I know, sometimes I'm still surprised. So I need diIIer
THE SIMPLE TRUTH 19
ent names Ior the thingies that determine my predictions and the
thingy that determines my experimental results. I call the Iormer
thingies 'belieI', and the latter thingy 'reality'."
Mark snorts. I don't even know why I bother listening to this
obvious nonsense. \hatever you say about this socalled 'reality', it
is merely another belieI. Lven your belieI that reality precedes your
belieIs is a belieI. It Iollows, as a logical inevitability, that reality
does not exist, only belieIs exist."
Hold on," says Autrey, could you repeat that last part: You
lost me with that sharp swerve there in the middle."
o matter what you say about reality, it's just another belieI,"
explains Mark. It Iollows with crushing necessity that there is no
reality, only belieIs."
I see," I say. The same way that no matter what you eat, you
need to eat it with your mouth. It Iollows that there is no Iood, only
mouths."
Precisely," says Mark. Lverything that you eat has to be in
your mouth. How can there be Iood that exists outside your mouth:
The thought is nonsense, proving that 'Iood' is an incoherent no
tion. That's why we're all starving to death, there's no Iood."
Autrey looks down at his stomach. But I'm not starving to
death."
Aha!" shouts Mark triumphantly. And how did you utter that
very objection: \ith your mouth, my Iriend! \ith your mouth!
\hat better demonstration could you ask that there is no Iood:"
Whats this about starvation?" demands a harsh, rasping voice
Irom directly behind us. Autrey and I stay calm, having gone
through this beIore. Mark leaps a Ioot in the air, startled almost out
oI his wits.
Inspector Larwin smiles tightly, pleased at achieving surprise,
and makes a small tick on his clipboard.
]ust a metaphor!" Mark says quickly. You don't need to take
away my mouth, or anything like that "
Why do you need a mouth iI there is no food:" demands Larwin
angrily. Never mind. I have no time Ior this foolishness. I am here to
inspect the sheep."
Flocks thriving, sir," I say. o dead sheep since ]anuary."
Excellent. I award you o.z units oI fitness. ow what is this per-
son doing here: Is he a necessary part oI the operations?"
As Iar as I can see, he would be oI more use to the human
species iI hung oII a hotair balloon as ballast," I say.
Ouch," says Autrey mildly.
I do not care about the human species. Let him speak Ior himself."
Mark draws himselI up haughtily. This mere shepherd," he says,
gesturing at me, has claimed that there is such a thing as reality.
This oIIends me, Ior I know with deep and abiding certainty that
there is no truth. The concept oI 'truth' is merely a stratagem Ior
people to impose their own belieIs on others. Lvery culture has a
diIIerent 'truth', and no culture's 'truth' is superior to any other.
This that I have said holds at all times in all places, and I insist that
you agree."
Hold on a second," says Autrey. II nothing is true, why should
I believe you when you say that nothing is true:"
I didn't say that nothing is true " says Mark.
Yes, you did," interjects Autrey, I heard you."
I said that 'truth' is an excuse used by some cultures to en
Iorce their belieIs on others. So when you say something is 'true',
you mean only that it would be advantageous to your own social
group to have it believed."
And this that you have said," I say, is it true:"
Absolutely, positively true!" says Mark emphatically. People
create their own realities."
Hold on," says Autrey, sounding puzzled again, saying that
people create their own realities is, logically, a completely separate
issue Irom saying that there is no truth, a state oI aIIairs I cannot
even imagine coherently, perhaps because you still have not ex
plained how exactly it is supposed to work "
There you go again," says Mark exasperatedly, trying to apply
your \estern concepts oI logic, rationality, reason, coherence, and
selIconsistency."
Creat," mutters Autrey, now I need to add a third subject
heading, to keep track oI this entirely separate and distinct claim "
It's not separate," says Mark. Look, you're taking the wrong
attitude by treating my statements as hypotheses, and careIully
THE SIMPLE TRUTH 21
deriving their consequences. You need to think oI them as Iully
general excuses, which I apply when anyone says something I don't
like. It's not so much a model oI how the universe works, as a Cet
Out oI ]ail Free" card. The key is to apply the excuse selectively.
\hen I say that there is no such thing as truth, that applies only to
your claim that the magic bucket works whether or not I believe in
it. It does not apply to my claim that there is no such thing as truth."
\m. why not:" inquires Autrey.
Mark heaves a patient sigh. Autrey, do you think you're the
Iirst person to think oI that question: To ask us how our own be
lieIs can be meaningIul iI all belieIs are meaningless: That's the
same thing many students say when they encounter this philosophy,
which, I'll have you know, has many adherents and an extensive lit
erature."
So what's the answer:" says Autrey.
\e named it the 'reIlexivity problem'," explains Mark.
But what's the answer:" persists Autrey.
Mark smiles condescendingly. Believe me, Autrey, you're not
the Iirst person to think oI such a simple question. There's no point
in presenting it to us as a triumphant reIutation."
But what's the actual answer?"
ow, I'd like to move on to the issue oI how logic kills cute ba
by seals "
You are wasting time," snaps Inspector Larwin.
ot to mention, losing track oI sheep," I say, tossing in another
pebble.
Inspector Larwin looks at the two arguers, both apparently un
willing to give up their positions. Listen," Larwin says, more kindly
now, I have a simple notion Ior resolving your dispute. You say,"
says Larwin, pointing to Mark, that people's belieIs alter their per
sonal realities. And you Iervently believe," his Iinger swivels to point
at Autrey, that Mark's belieIs cant alter reality. So let Mark believe
really hard that he can Ily, and then step oII a cliII. Mark shall see
himselI Ily away like a bird, and Autrey shall see him plummet down
and go splat, and you shall both be happy."
\e all pause, considering this.
It sounds reasonable." Mark says Iinally.
There's a cliII right there," observes Inspector Larwin.
Autrey is wearing a look oI intense concentration. Finally he
shouts: \ait! II that were true, we would all have long since de
parted into our own private universes, in which case the other
people here are only Iigments oI your imagination - there's no point
in trying to prove anything to us "
A long dwindling scream comes Irom the nearby cliII, Iollowed
by a dull and lonely splat. Inspector Larwin Ilips his clipboard to
the page that shows the current gene pool and pencils in a slightly
lower Irequency Ior Mark's alleles.
Autrey looks slightly sick. \as that really necessary:"
Necessary?" says Inspector Larwin, sounding puzzled. It just
happened. I don't quite understand your question."
Autrey and I turn back to our bucket. It's time to bring in
the sheep. You wouldn't want to Iorget about that part. Otherwise
what would be the point:
THE SIMPLE TRUTH 23
2. What Do We Mean By Rationality?
\e mean:
. Epistemic rationality: believing, and updating on
evidence, so as to systematically improve the
correspondence between your map and the territory
z
.
The art oI obtaining belieIs that correspond to reality as
closely as possible. This correspondence is commonly
termed truth" or accuracy", and we're happy to call it
that.
z. Instrumental rationality: achieving your values. Not
necessarily your values" in the sense oI being selfish values
or unshared values: your values" means anything you care
about. The art oI choosing actions that steer the Iuture
toward outcomes ranked higher in your preIerences. On
L\ we sometimes reIer to this as winning".
II that seems like a perIectly good deIinition, you can stop read
ing here, otherwise continue.
Sometimes experimental psychologists uncover human reason
ing that seems very strange Ior example
, someone rates the prob

ability Bill plays jazz" as less than the probability Bill is an accoun
tant who plays jazz". This seems like an odd judgment, since any
particular jazzplaying accountant is obviously a jazz player. But to
what higher vantage point do we appeal in saying that the judgment
is wrong?
Lxperimental psychologists use two gold standards: probability
theory, and decision theory. Since it is a universal law oI probability
theory that P(A) a P(A & B), the judgment P(Bill plays jazz") <
P(Bill plays jazz" & Bill is accountant") is labeled incorrect.
To keep it technical, you would say that this probability judg
ment is non-Bayesian. BelieIs that conIorm to a coherent probability
distribution, and decisions that maximize the probabilistic expecta
tion oI a coherent utility Iunction, are called Bayesian".
This does not quite exhaust the problem oI what is meant in
practice by rationality", Ior two major reasons:
. http://lesswrong.com/lw/31/what_do_we_mean_by_rationality/
z. Page ,, 'The Simple Truth'.
. http://lesswrong.com/lw/ji/conjunction_fallacy/
First, the Bayesian Iormalisms in their Iull Iorm are compu
tationally intractable on most realworld problems. o one can
actually calculate and obey the math, any more than you can predict
the stock market by calculating the movements oI quarks.
This is why we have a whole site called Less \rong", rather
than simply stating the Iormal axioms and being done. There's
a whole Iurther art to Iinding the truth and accomplishing value
from inside a human mind: we have to learn our own Ilaws, overcome
our biases, prevent ourselves Irom selIdeceiving, get ourselves into
good emotional shape to conIront the truth and do what needs do
ing, etcetera etcetera and so on.
Second, sometimes the meaning oI the math itselI is called into
question. The exact rules oI probability theory are called into ques
tion by e.g. anthropic problems
in which the number oI observers

is uncertain. The exact rules oI decision theory are called into ques
tion by e.g. ewcomblike problems
,
in which other agents may
predict your decision beIore it happens.
In cases like these, it is Iutile to try to settle the problem by
coming up with some new deIinition oI the word rational", and say
ing, ThereIore my preIerred answer, by definition, is what is meant
by the word 'rational'." This simply begs the question oI why any
one should pay attention to your deIinition. \e aren't interested
in probability theory because it is the holy word handed down Irom
Laplace. \e're interested in Bayesianstyle belieIupdating (with
Occam priors) because we expect that this style oI thinking gets us
systematically closer to, you know, accuracy, the map that reIlects
the territory. (More on the Iutility oI arguing by deIinition" here
6
and here
;
.)
And then there are questions oI How to think" that seem not
quite answered by either probability theory or decision theory like
the question oI how to Ieel about the truth once we have it
8
. Here
again, trying to deIine rationality" a particular way doesn't support
an answer, merely presume it.
From the Twelve Virtues oI Rationality
j
:
. http://www.anthropic-principle.com/primer.html
,. http://lesswrong.com/lw/nc/newcombs_problem_and_regret_of_rationality/
6. Page j, 'The Parable oI Hemlock'.
;. Page z6, 'Arguing By LeIinition"'.
8. Page 6z,, 'Feeling Rational'.
WHAT DO WE MEAN BY 'RATIONALITY? 25
How can you improve your conception oI
rationality: ot by saying to yourselI, It is my duty
to be rational." By this you only enshrine your
mistaken conception. Perhaps your conception oI
rationality is that it is rational to believe the words
oI the Creat Teacher, and the Creat Teacher says,
The sky is green," and you look up at the sky and
see blue. II you think: It may look like the sky is
blue, but rationality is to believe the words oI the
Creat Teacher," you lose a chance to discover your
mistake.
Lo not ask whether it is the \ay" to do this or
that. Ask whether the sky is blue or green. II you
speak overmuch oI the \ay you will not attain it.
You may try to name the highest principle with
names such as the map that reIlects the territory"
or experience oI success and Iailure" or Bayesian
decision theory". But perhaps you describe
incorrectly the nameless virtue. How will you
discover your mistake: ot by comparing your
description to itselI, but by comparing it to that
which you did not name.
\e are not here to argue the meaning oI a word
o
, not even iI
that word is rationality". The point oI attaching sequences oI let
ters to particular concepts is to let two people communicate
to
help transport thoughts Irom one mind to another. You cannot
change reality, or prove the thought, by manipulating which mean
ings go with which words.
So iI you understand what concept we are generally getting at with
this word rationality", and with the subterms epistemic rational
ity" and instrumental rationality", we have communicated: we have
accomplished everything there is to accomplish by talking about
how to deIine rationality". \hat's leIt to discuss is not what mean-
j. http://yudkowsky.net/virtues/
o. Page zz8, 'Lisputing LeIinitions'.
. Page z8, 'The Argument Irom Common \sage'.
ing to attach to the syllables rationality", what's leIt to discuss is
what is a good way to think.
\ith that said, you should be aware that many oI us will regard
as controversial at the very least any construal oI rationality" that
makes it non-normative:
For example, iI you say, The rational belieI is X, but the true
belieI is Y" then you are probably using the word rational" in a way
that means something other than what most oI us have in mind.
(L.g. some oI us expect rationality" to be consistent under reflection
rationally" looking at the evidence, and rationally" considering
how your mind processes the evidence, shouldn't lead to two diIIer
ent conclusions.) Similarly, iI you Iind yourselI saying The rational
thing to do is X, but the right thing to do is Y" then you are almost
certainly using one oI the words rational" or right" in a way that a
huge chunk oI readers won't agree with.
In this case or in any other case where controversy threatens
you should substitute more speciIic language
z
: The selIbeneIit
ing thing to do is to run away, but I hope I would at least try to drag
the girl oII the railroad tracks" or Causal decision theory as usually
Iormulated says you should twobox on ewcomb's Problem
, but
I'd rather have a million dollars."
X is rational!" is usually just a more strident way oI saying I
think X is true" or I think X is good". So why have an additional
word Ior rational" as well as true" and good": Because we want
to talk about systematic methods Ior obtaining truth and winning.
The word rational" has potential pitIalls, but there are plenty
oI nonborderline cases where rational" works Iine to communicate
what one is getting at, likewise irrational". In these cases we're not
aIraid to use it.
Yet one should also be careIul not to overuse that word. One
receives no points merely Ior pronouncing it loudly. II you speak
overmuch oI the \ay you will not attain it.
z. Page z,, 'Taboo Your \ords'.
. http://lesswrong.com/lw/nc/newcombs_problem_and_regret_of_rationality/
WHAT DO WE MEAN BY 'RATIONALITY? 27
3. An Intuitive Explanation of Bayes'
Theorem
Bayes Theorem
for the curious and bewildered;
an excruciatingly gentle introduction.
Your Iriends and colleagues are talking about something called
Bayes' Theorem" or Bayes' Rule", or something called Bayesian
reasoning. They sound really enthusiastic about it, too, so you
google and Iind a webpage about Bayes' Theorem and.
It's this equation. That's all. ]ust one equation. The page you
Iound gives a deIinition oI it, but it doesn't say what it is, or why
it's useIul, or why your Iriends would be interested in it. It looks
like this random statistics thing.
So you came here. Maybe you don't understand what the equation
says. Maybe you understand it in theory, but every time you try to
apply it in practice you get mixed up trying to remember the
diIIerence between p(a|x) and p(x|a), and whether
p(a)*p(x|a) belongs in the numerator or the denominator.
Maybe you see the theorem, and you understand the theorem, and
you can use the theorem, but you can't understand why your
Iriends and/or research colleagues seem to think it's the secret oI
the universe. Maybe your Iriends are all wearing Bayes' Theorem
Tshirts, and you're Ieeling leIt out. Maybe you're a girl looking
Ior a boyIriend, but the boy you're interested in reIuses to date
anyone who isn't Bayesian". \hat matters is that Bayes is cool,
and iI you don't know Bayes, you aren't cool.
\hy does a mathematical concept generate this strange
enthusiasm in its students: \hat is the socalled Bayesian
Revolution now sweeping through the sciences, which claims to
subsume even the experimental method itselI as a special case:
\hat is the secret that the adherents oI Bayes know: \hat is the
light that they have seen:
. http://yudkowsky.net/rational/bayes
Soon you will know. Soon you will be one oI us.
\hile there are a Iew existing online explanations oI Bayes'
Theorem, my experience with trying to introduce people to
Bayesian reasoning is that the existing online explanations are too
abstract. Bayesian reasoning is very counterintuitive. People do not
employ Bayesian reasoning intuitively, Iind it very diIIicult to learn
Bayesian reasoning when tutored, and rapidly Iorget Bayesian
methods once the tutoring is over. This holds equally true Ior
novice students and highly trained proIessionals in a Iield.
Bayesian reasoning is apparently one oI those things which, like
quantum mechanics or the \ason Selection Test, is inherently
diIIicult Ior humans to grasp with our builtin mental Iaculties.
Or so they claim. Here you will Iind an attempt to oIIer an
intuitive explanation oI Bayesian reasoning an excruciatingly
gentle introduction that invokes all the human ways oI grasping
numbers, Irom natural Irequencies to spatial visualization. The
intent is to convey, not abstract rules Ior manipulating numbers,
but what the numbers mean, and why the rules are what they are
(and cannot possibly be anything else). \hen you are Iinished
reading this page, you will see Bayesian problems in your dreams.
And let's begin.
Here's a story problem about a situation that doctors oIten
encounter:
% oI women at age Iorty who participate in routine
screening have breast cancer. 8o% oI women with breast
cancer will get positive mammographies. j.6% oI women
without breast cancer will also get positive
mammographies. A woman in this age group had a positive
mammography in a routine screening. \hat is the
probability that she actually has breast cancer:
\hat do you think the answer is: II you haven't encountered this
kind oI problem beIore, please take a moment to come up with
AN INTUITIVE EXPLANATION OF BAYES' THEOREM 29
your own answer beIore continuing.
ext, suppose I told you that most doctors get the same wrong
answer on this problem usually, only around ,% oI doctors get it
right. (Really: ,%: Is that a real number, or an urban legend
based on an Internet poll:" It's a real number. See Casscells,
Schoenberger, and Crayboys j;8, Lddy j8z, Cigerenzer and
HoIIrage jj,, and many other studies. It's a surprising result
which is easy to replicate, so it's been extensively replicated.)
Lo you want to think about your answer again: Here's a ]avascript
calculator iI you need one. This calculator has the usual
precedence rules, multiplication beIore addition and so on. II
you're not sure, I suggest using parentheses.
On the story problem above, most doctors estimate the
probability to be between ;o% and 8o%, which is wildly incorrect.
Here's an alternate version oI the problem on which doctors Iare
somewhat better:
o out oI ooo women at age Iorty who participate in
routine screening have breast cancer. 8oo out oI ooo
women with breast cancer will get positive
mammographies. j6 out oI ooo women without breast
cancer will also get positive mammographies. II ooo
women in this age group undergo a routine screening, about
what Iraction oI women with positive mammographies will
actually have breast cancer:
And Iinally, here's the problem on which doctors Iare best oI all,
with 6% nearly halI arriving at the correct answer:
oo out oI o,ooo women at age Iorty who participate in
routine screening have breast cancer. 8o oI every oo
women with breast cancer will get a positive
mammography. j,o out oI j,joo women without breast
cancer will also get a positive mammography. II o,ooo
women in this age group undergo a routine screening, about
what Iraction oI women with positive mammographies will
actually have breast cancer:
The correct answer is ;.8%, obtained as Iollows: Out oI o,ooo
women, oo have breast cancer, 8o oI those oo have positive
mammographies. From the same o,ooo women, j,joo will not
have breast cancer and oI those j,joo women, j,o will also get
positive mammographies. This makes the total number oI women
with positive mammographies j,o-8o or ,oo. OI those ,oo
women with positive mammographies, 8o will have cancer.
Lxpressed as a proportion, this is 8o/,oo or o.o;;6; or ;.8%.
To put it another way, beIore the mammography screening, the
o,ooo women can be divided into two groups:
Croup : oo women with breast cancer.
Croup z: j,joo women without breast cancer.
Summing these two groups gives a total oI o,ooo patients,
conIirming that none have been lost in the math. AIter the
mammography, the women can be divided into Iour groups:
Croup A: 8o women with breast cancer, and a positive
mammography.
Croup B: zo women with breast cancer, and a negative
mammography.
Croup C: j,o women without breast cancer, and a
positive mammography.
Croup L: 8,j,o women without breast cancer, and a
negative mammography.
As you can check, the sum oI all Iour groups is still o,ooo. The
sum oI groups A and B, the groups with breast cancer, corresponds
to group , and the sum oI groups C and L, the groups without
breast cancer, corresponds to group z, so administering a
mammography does not actually change the number oI women with
breast cancer. The proportion oI the cancer patients (A - B)
within the complete set oI patients (A - B - C - L) is the same as
the % prior chance that a woman has cancer: (8o - zo) / (8o - zo -
j,o - 8j,o) oo / oooo %.
The proportion oI cancer patients with positive results, within the
group oI all patients with positive results, is the proportion oI (A)
within (A - C): 8o / (8o - j,o) 8o / oo ;.8%. II you
administer a mammography to o,ooo patients, then out oI the
oo with positive mammographies, 8o oI those positive
mammography patients will have cancer. This is the correct
answer, the answer a doctor should give a positivemammography
patient iI she asks about the chance she has breast cancer, iI
thirteen patients ask this question, roughly out oI those will
have cancer.
The most common mistake is to ignore the original Iraction oI
women with breast cancer, and the Iraction oI women without
breast cancer who receive Ialse positives, and Iocus only on the
Iraction oI women with breast cancer who get positive results. For
example, the vast majority oI doctors in these studies seem to have
thought that iI around 8o% oI women with breast cancer have
positive mammographies, then the probability oI a women with a
positive mammography having breast cancer must be around 8o%.
Figuring out the Iinal answer always requires all three pieces oI
inIormation the percentage oI women with breast cancer, the
percentage oI women without breast cancer who receive Ialse
positives, and the percentage oI women with breast cancer who
receive (correct) positives.
To see that the Iinal answer always depends on the original
Iraction oI women with breast cancer, consider an alternate
universe in which only one woman out oI a million has breast
cancer. Lven iI mammography in this world detects breast cancer
in 8 out oI o cases, while returning a Ialse positive on a woman
without breast cancer in only out oI o cases, there will still be a
hundred thousand Ialse positives Ior every real case oI cancer
detected. The original probability that a woman has cancer is so
extremely low that, although a positive result on the
mammography does increase the estimated probability, the
probability isn't increased to certainty or even a noticeable
chance", the probability goes Irom :,ooo,ooo to :oo,ooo.
Similarly, in an alternate universe where only one out oI a million
women does not have breast cancer, a positive result on the
patient's mammography obviously doesn't mean that she has an
8o% chance oI having breast cancer! II this were the case her
estimated probability oI having cancer would have been revised
drastically downward aIter she got a positive result on her
mammography an 8o% chance oI having cancer is a lot less than
jj.jjjj%! II you administer mammographies to ten million
women in this world, around eight million women with breast
cancer will get correct positive results, while one woman without
breast cancer will get Ialse positive results. Thus, iI you got a
positive mammography in this alternate universe, your chance oI
having cancer would go Irom jj.jjjj% up to jj.jjjj8;%. That
is, your chance oI being healthy would go Irom :,ooo,ooo down
to :8,ooo,ooo.
These two extreme examples help demonstrate that the
mammography result doesn't replace your old inIormation about
the patient's chance oI having cancer, the mammography slides the
estimated probability in the direction oI the result. A positive
result slides the original probability upward, a negative result slides
the probability downward. For example, in the original problem
where % oI the women have cancer, 8o% oI women with cancer
get positive mammographies, and j.6% oI women without cancer
get positive mammographies, a positive result on the
mammography slides the % chance upward to ;.8%.
Most people encountering problems oI this type Ior the Iirst time
carry out the mental operation oI replacing the original %
probability with the 8o% probability that a woman with cancer
gets a positive mammography. It may seem like a good idea, but it
just doesn't work. The probability that a woman with a positive
mammography has breast cancer" is not at all the same thing as
the probability that a woman with breast cancer has a positive
mammography", they are as unlike as apples and cheese. Finding
the Iinal answer, the probability that a woman with a positive
mammography has breast cancer", uses all three pieces oI problem
inIormation the prior probability that a woman has breast
cancer", the probability that a woman with breast cancer gets a
positive mammography", and the probability that a woman
without breast cancer gets a positive mammography".
Fun
Fact!
Q. What is the Bayesian Conspiracy?
A. The Bayesian Conspiracy is a
multinational, interdisciplinary, and
shadowy group oI scientists that controls
publication, grants, tenure, and the illicit
traIIic in grad students. The best way to be
accepted into the Bayesian Conspiracy is to
join the Campus Crusade Ior Bayes in high
school or college, and gradually work your
way up to the inner circles. It is rumored
that at the upper levels oI the Bayesian
Conspiracy exist nine silent Iigures known
only as the Bayes Council.
To see that the Iinal answer always depends on the chance that a
woman without breast cancer gets a positive mammography,
consider an alternate test, mammography-. Like the original test,
mammography- returns positive Ior 8o% oI women with breast
cancer. However, mammography- returns a positive result Ior only
one out oI a million women without breast cancer
mammography- has the same rate oI Ialse negatives, but a vastly
lower rate oI Ialse positives. Suppose a patient receives a positive
mammography-. \hat is the chance that this patient has breast
cancer: \nder the new test, it is a virtual certainty jj.j88%, i.e.,
a in 8o8z chance oI being healthy.
Remember, at this point, that neither mammography nor
mammography- actually change the number oI women who have
breast cancer. It may seem like There is a virtual certainty you
have breast cancer" is a terrible thing to say, causing much distress
and despair, that the more hopeIul verdict oI the previous
mammography test a ;.8% chance oI having breast cancer was
much to be preIerred. This comes under the heading oI Lon't
shoot the messenger". The number oI women who really do have
cancer stays exactly the same between the two cases. Only the
accuracy with which we detect cancer changes. \nder the previous
mammography test, 8o women with cancer (who already had
cancer, beIore the mammography) are Iirst told that they have a
;.8% chance oI having cancer, creating X amount oI uncertainty
and Iear, aIter which more detailed tests will inIorm them that
they deIinitely do have breast cancer. The old mammography test
also involves inIorming j,o women without breast cancer that they
have a ;.8% chance oI having cancer, thus creating twelve times as
much additional Iear and uncertainty. The new test,
mammography-, does not give j,o women Ialse positives, and the
8o women with cancer are told the same Iacts they would have
learned eventually, only earlier and without an intervening period
oI uncertainty. Mammography- is thus a better test in terms oI its
total emotional impact on patients, as well as being more accurate.
Regardless oI its emotional impact, it remains a Iact that a patient
with positive mammography- has a jj.j88% chance oI having
breast cancer.
OI course, that mammography- does not give j,o healthy women
Ialse positives means that all 8o oI the patients with positive
mammography- will be patients with breast cancer. Thus, iI you
have a positive mammography-, your chance oI having cancer is a
virtual certainty. It is because mammography- does not generate as
many Ialse positives (and needless emotional stress), that the
(much smaller) group oI patients who do get positive results will be
composed almost entirely oI genuine cancer patients (who have
bad news coming to them regardless oI when it arrives).
Similarly, let's suppose that we have a less discriminating test,
mammography*, that still has a zo% rate oI Ialse negatives, as in
the original case. However, mammography* has an 8o% rate oI
Ialse positives. In other words, a patient without breast cancer has
an 8o% chance oI getting a Ialse positive result on her
mammography* test. II we suppose the same % prior probability
that a patient presenting herselI Ior screening has breast cancer,
what is the chance that a patient with positive mammography* has
cancer:
Croup : oo patients with breast cancer.
Croup z: j,joo patients without breast cancer.
AIter mammography* screening:
Croup A: 8o patients with breast cancer and a positive"
mammography*.
Croup B: zo patients with breast cancer and a negative"
mammography*.
Croup C: ;jzo patients without breast cancer and a
positive" mammography*.
Croup L: j8o patients without breast cancer and a
negative" mammography*.
The result works out to 8o / 8,ooo, or o.o. This is exactly the
same as the % prior probability that a patient has breast cancer!
A positive" result on mammography* doesn't change the
probability that a woman has breast cancer at all. You can
similarly veriIy that a negative" mammography* also counts Ior
nothing. And in Iact it must be this way, because iI
mammography* has an 8o% hit rate Ior patients with breast
cancer, and also an 8o% rate oI Ialse positives Ior patients without
breast cancer, then mammography* is completely uncorrelated with
breast cancer. There's no reason to call one result positive" and
one result negative", in Iact, there's no reason to call the test a
mammography". You can throw away your expensive
mammography* equipment and replace it with a random number
generator that outputs a red light 8o% oI the time and a green
light zo% oI the time, the results will be the same. Furthermore,
there's no reason to call the red light a positive" result or the
green light a negative" result. You could have a green light 8o%
oI the time and a red light zo% oI the time, or a blue light 8o% oI
the time and a purple light zo% oI the time, and it would all have
the same bearing on whether the patient has breast cancer: i.e., no
bearing whatsoever.
\e can show algebraically that this must hold Ior any case where
the chance oI a true positive and the chance oI a Ialse positive are
the same, i.e:
Croup : oo patients with breast cancer.
Croup z: j,joo patients without breast cancer.
ow consider a test where the probability oI a true positive and
the probability oI a Ialse positive are the same number M (in the
example above, M8o% or M o.8):
Croup A: oo*M patients with breast cancer and a
positive" result.
Croup B: oo*( M) patients with breast cancer and a
negative" result.
Croup C: j,joo*M patients without breast cancer and a
positive" result.
Croup L: j,joo*( M) patients without breast cancer
and a negative" result.
The proportion oI patients with breast cancer, within the group oI
patients with a positive" result, then equals oo*M / (oo*M -
jjoo*M) oo / (oo - jjoo) %. This holds true regardless oI
whether M is 8o%, o%, ,o%, or oo%. II we have a
mammography* test that returns positive" results Ior jo% oI
patients with breast cancer and returns positive" results Ior jo%
oI patients without breast cancer, the proportion oI positive"
testing patients who have breast cancer will still equal the original
proportion oI patients with breast cancer, i.e., %.
You can run through the same algebra, replacing the prior
proportion oI patients with breast cancer with an arbitrary
percentage P:
Croup : \ithin some number oI patients, a Iraction P
have breast cancer.
Croup z: \ithin some number oI patients, a Iraction (
P) do not have breast cancer.
AIter a cancer test" that returns positive" Ior a Iraction M oI
patients with breast cancer, and also returns positive" Ior the
same Iraction M oI patients without cancer:
Croup A: P*M patients have breast cancer and a
positive" result.
Croup B: P*( M) patients have breast cancer and a
negative" result.
Croup C: ( P)*M patients have no breast cancer and a
positive" result.
Croup L: ( P)*( M) patients have no breast cancer
and a negative" result.
The chance that a patient with a positive" result has breast cancer
is then the proportion oI group A within the combined group A -
C, or P*M / |P*M - ( P)*M], which, cancelling the common
Iactor M Irom the numerator and denominator, is P / |P - ( P)]
or P / or just P. II the rate oI Ialse positives is the same as the
rate oI true positives, you always have the same probability aIter
the test as when you started.
\hich is common sense. Take, Ior example, the test" oI Ilipping
a coin, iI the coin comes up heads, does it tell you anything about
whether a patient has breast cancer: o, the coin has a ,o%
chance oI coming up heads iI the patient has breast cancer, and
also a ,o% chance oI coming up heads iI the patient does not have
breast cancer. ThereIore there is no reason to call either heads or
tails a positive" result. It's not the probability being ,o/,o" that
makes the coin a bad test, it's that the two probabilities, Ior
cancer patient turns up heads" and healthy patient turns up
heads", are the same. II the coin was slightly biased, so that it had
a 6o% chance oI coming up heads, it still wouldn't be a cancer test
what makes a coin a poor test is not that it has a ,o/,o chance oI
coming up heads iI the patient has cancer, but that it also has a ,o/
,o chance oI coming up heads iI the patient does not have cancer.
You can even use a test that comes up positive" Ior cancer
patients oo% oI the time, and still not learn anything. An
example oI such a test is Add z - z and see iI the answer is ."
This test returns positive oo% oI the time Ior patients with
breast cancer. It also returns positive oo% oI the time Ior
patients without breast cancer. So you learn nothing.
The original proportion oI patients with breast cancer is known as
the prior probability. The chance that a patient with breast cancer
gets a positive mammography, and the chance that a patient
without breast cancer gets a positive mammography, are known as
the two conditional probabilities. Collectively, this initial
inIormation is known as the priors. The Iinal answer the
estimated probability that a patient has breast cancer, given that
we know she has a positive result on her mammography is known
as the revised probability or the posterior probability. \hat we've just
shown is that if the two conditional probabilities are equal, the posterior
probability equals the prior probability.
Fun
Fact!
Q. How can I find the priors for a
problem?
A. Many commonly used priors are listed
in the Handbook of Chemistry and Physics.
Q. Where do priors originally come
from?
A. ever ask that question.
Q. Uh huh. Then where do scientists
get their priors?
A. Priors Ior scientiIic problems are
established by annual vote oI the AAAS. In
recent years the vote has become Iractious
and controversial, with widespread
acrimony, Iactional polarization, and several
outright assassinations. This may be a Iront
Ior inIighting within the Bayes Council, or
it may be that the disputants have too
much spare time. o one is really sure.
Q. I see. And where does everyone
else get their priors?
A. They download their priors Irom Kazaa.
Q. What if the priors I want arent
available on Kazaa?
A. There's a small, cluttered antique shop
in a back alley oI San Francisco's
Chinatown. Dont ask about the bronze rat.
Actually, priors are true or Ialse just like the Iinal answer they
reIlect reality and can be judged by comparing them against
reality. For example, iI you think that jzo out oI o,ooo women
in a sample have breast cancer, and the actual number is oo out oI
o,ooo, then your priors are wrong. For our particular problem,
the priors might have been established by three studies a study
on the case histories oI women with breast cancer to see how many
oI them tested positive on a mammography, a study on women
without breast cancer to see how many oI them test positive on a
mammography, and an epidemiological study on the prevalence oI
breast cancer in some speciIic demographic.
Suppose that a barrel contains many small plastic eggs. Some eggs
are painted red and some are painted blue. o% oI the eggs in the
bin contain pearls, and 6o% contain nothing. o% oI eggs
containing pearls are painted blue, and o% oI eggs containing
nothing are painted blue. \hat is the probability that a blue egg
contains a pearl: For this example the arithmetic is simple enough
that you may be able to do it in your head, and I would suggest
trying to do so.
A more compact way oI speciIying the problem:
- p(pearl) = 40%
- p(blue|pearl) = 30%
- p(blue|~pearl) = 10%
p(pearl|blue) = ?
~ is shorthand Ior not", so ~pearl reads not pearl".
blue|pearl is shorthand Ior blue given pearl" or the
probability that an egg is painted blue, given that the egg contains
a pearl". One thing that's conIusing about this notation is that the
order oI implication is read righttoleIt, as in Hebrew or Arabic.
blue|pearl means blue<-pearl", the degree to which pearl
ness implies blueness, not the degree to which blueness implies
pearlness. This is conIusing, but it's unIortunately the standard
notation in probability theory.
Readers Iamiliar with quantum mechanics will have already
encountered this peculiarity, in quantum mechanics, Ior example,
<d|c><c|b><b|a> reads as the probability that a particle at A
goes to B, then to C, ending up at L". To Iollow the particle, you
move your eyes Irom right to leIt. Reading Irom leIt to right, |
means given", reading Irom right to leIt, | means implies" or
leads to". Thus, moving your eyes Irom leIt to right,
blue|pearl reads blue given pearl" or the probability that an
egg is painted blue, given that the egg contains a pearl". Moving
your eyes Irom right to leIt, blue|pearl reads pearl implies
blue" or the probability that an egg containing a pearl is painted
blue".
The item on the right side is what you already know or the premise,
and the item on the leIt side is the implication or conclusion. II we
have p(blue|pearl) = 30%, and we already know that some
egg contains a pearl, then we can conclude there is a o% chance
that the egg is painted blue. Thus, the Iinal Iact we're looking Ior
the chance that a blue egg contains a pearl" or the probability
that an egg contains a pearl, iI we know the egg is painted blue"
reads p(pearl|blue).
Let's return to the problem. \e have that o% oI the eggs
contain pearls, and 6o% oI the eggs contain nothing. o% oI the
eggs containing pearls are painted blue, so z% oI the eggs
altogether contain pearls and are painted blue. o% oI the eggs
containing nothing are painted blue, so altogether 6% oI the eggs
contain nothing and are painted blue. A total oI 8% oI the eggs
are painted blue, and a total oI z% oI the eggs are painted blue
and contain pearls, so the chance a blue egg contains a pearl is z/8
or z/ or around 6;%.
The applet below, courtesy oI Christian Rovner, shows a graphic
representation oI this problem:
(Are you having trouble seeing this applet: Lo you see an image oI
the applet rather than the applet itselI: Try downloading an
updated ]ava
z
.)
z. http://www.java.com/en/index.jsp
Looking at this applet, it's easier to see why the Iinal answer
depends on all three probabilities, it's the differential pressure
between the two conditional probabilities, p(blue|pearl) and
p(blue|~pearl), that slides the prior probability p(pearl) to
the posterior probability p(pearl|blue).
As beIore, we can see the necessity oI all three pieces oI
inIormation by considering extreme cases (Ieel Iree to type them
into the applet). In a (large) barrel in which only one egg out oI a
thousand contains a pearl, knowing that an egg is painted blue
slides the probability Irom o.% to o.% (instead oI sliding the
probability Irom o% to 6;%). Similarly, iI jjj out oI ooo eggs
contain pearls, knowing that an egg is blue slides the probability
Irom jj.j% to jj.j66%, the probability that the egg does not
contain a pearl goes Irom /ooo to around /ooo. Lven when the
prior probability changes, the diIIerential pressure oI the two
conditional probabilities always slides the probability in the same
direction. II you learn the egg is painted blue, the probability the
egg contains a pearl always goes up but it goes up from the prior
probability, so you need to know the prior probability in order to
calculate the Iinal answer. o.% goes up to o.%, o% goes up to
z,%, o% goes up to 6;%, 8o% goes up to jz%, and jj.j% goes
up to jj.j66%. II you're interested in knowing how any other
probabilities slide, you can type your own prior probability into
the ]ava applet. You can also click and drag the dividing line
between pearl and ~pearl in the upper bar, and watch the
posterior probability change in the bottom bar.
Studies oI clinical reasoning show that most doctors carry out the
mental operation oI replacing the original % probability with the
8o% probability that a woman with cancer would get a positive
mammography. Similarly, on the pearlegg problem, most
respondents unIamiliar with Bayesian reasoning would probably
respond that the probability a blue egg contains a pearl is o%, or
perhaps zo% (the o% chance oI a true positive minus the o%
chance oI a Ialse positive). Lven iI this mental operation seems
like a good idea at the time, it makes no sense in terms oI the
question asked. It's like the experiment in which you ask a
secondgrader: II eighteen people get on a bus, and then seven
more people get on the bus, how old is the bus driver:" Many
secondgraders will respond: TwentyIive." They understand
when they're being prompted to carry out a particular mental
procedure, but they haven't quite connected the procedure to
reality. Similarly, to Iind the probability that a woman with a
positive mammography has breast cancer, it makes no sense
whatsoever to replace the original probability that the woman has
cancer with the probability that a woman with breast cancer gets a
positive mammography. either can you subtract the probability
oI a Ialse positive Irom the probability oI the true positive. These
operations are as wildly irrelevant as adding the number oI people
on the bus to Iind the age oI the bus driver.
I keep emphasizing the idea that evidence slides probability
because oI research that shows people tend to use spatial intutions
to grasp numbers. In particular, there's interesting evidence that
we have an innate sense oI quantity that's localized to leIt inIerior
parietal cortex patients with damage to this area can selectively
lose their sense oI whether , is less than 8, while retaining their
ability to read, write, and so on. (Yes, really!) The parietal cortex
processes our sense oI where things are in space (roughly speaking),
so an innate number line", or rather quantity line", may be
responsible Ior the human sense oI numbers. This is why I suggest
visualizing Bayesian evidence as sliding the probability along the
number line, my hope is that this will translate Bayesian reasoning
into something that makes sense to innate human brainware.
(That, really, is what an intuitive explanation" is.) For more
inIormation, see Stanislas Lehaene's The Number Sense.
A study by Cigerenzer and HoIIrage in jj, showed that some
ways oI phrasing story problems are much more evocative oI
correct Bayesian reasoning. The least evocative phrasing used
probabilities. A slightly more evocative phrasing used Irequencies
instead oI probabilities, the problem remained the same, but
instead oI saying that % oI women had breast cancer, one would
say that out oI oo women had breast cancer, that 8o out oI oo
women with breast cancer would get a positive mammography, and
so on. \hy did a higher proportion oI subjects display Bayesian
reasoning on this problem: Probably because saying out oI oo
women" encourages you to concretely visualize X women with
cancer, leading you to visualize X women with cancer and a
positive mammography, etc.
The most eIIective presentation Iound so Iar is what's known as
natural frequencies saying that o out oI oo eggs contain pearls, z
out oI o eggs containing pearls are painted blue, and 6 out oI 6o
eggs containing nothing are painted blue. A natural frequencies
presentation is one in which the inIormation about the prior
probability is included in presenting the conditional probabilities.
II you were just learning about the eggs' conditional probabilities
through natural experimentation, you would in the course oI
cracking open a hundred eggs crack open around o eggs
containing pearls, oI which z eggs would be painted blue, while
cracking open 6o eggs containing nothing, oI which about 6 would
be painted blue. In the course oI learning the conditional
probabilities, you'd see examples oI blue eggs containing pearls
about twice as oIten as you saw examples oI blue eggs containing
nothing.
It may seem like presenting the problem in this way is cheating",
and indeed iI it were a story problem in a math book, it probably
would be cheating. However, iI you're talking about real doctors,
you want to cheat, you want the doctors to draw the right
conclusions as easily as possible. The obvious next move would be
to present all medical statistics in terms oI natural Irequencies.
\nIortunately, while natural Irequencies are a step in the right
direction, it probably won't be enough. \hen problems are
presented in natural Irequences, the proportion oI people using
Bayesian reasoning rises to around halI. A big improvement, but
not big enough when you're talking about real doctors and real
patients.
A presentation oI the problem in natural frequencies might be
visualized like this:
In the Irequency visualization, the selective attrition oI the two
conditional probabilities changes the proportion oI eggs that
contain pearls. The bottom bar is shorter than the top bar, just as
the number oI eggs painted blue is less than the total number oI
eggs. The probability graph shown earlier is really just the
Irequency graph with the bottom bar renormalized", stretched
out to the same length as the top bar. In the Irequency applet you
can change the conditional probabilities by clicking and dragging
the leIt and right edges oI the graph. (For example, to change the
conditional probability blue|pearl, click and drag the line on
the leIt that stretches Irom the leIt edge oI the top bar to the leIt
edge oI the bottom bar.)
In the probability applet, you can see that when the conditional
probabilities are equal, there's no differential pressure the arrows
are the same size so the prior probability doesn't slide between
the top bar and the bottom bar. But the bottom bar in the
probability applet is just a renormalized (stretched out) version oI
the bottom bar in the Irequency applet, and the Irequency applet
shows why the probability doesn't slide iI the two conditional
probabilities are equal. Here's a case where the prior proportion
oI pearls remains o%, and the proportion oI pearl eggs painted
blue remains o%, but the number oI empty eggs painted blue is
also o%:
II you diminish two shapes by the same Iactor, their relative
proportion will be the same as beIore. II you diminish the leIt
section oI the top bar by the same Iactor as the right section, then
the bottom bar will have the same proportions as the top bar it'll
just be smaller. II the two conditional probabilities are equal,
learning that the egg is blue doesn't change the probability that the
egg contains a pearl Ior the same reason that similar triangles
have identical angles, geometric Iigures don't change shape when
you shrink them by a constant Iactor.
In this case, you might as well just say that 30% of eggs are painted
blue, since the probability oI an egg being painted blue is
independent oI whether the egg contains a pearl. Applying a test"
that is statistically independent oI its condition just shrinks the
sample size. In this case, requiring that the egg be painted blue
doesn't shrink the group oI eggs with pearls any more or less than
it shrinks the group oI eggs without pearls. It just shrinks the total
number oI eggs in the sample.
Fun
Fact!
Q. Why did the Bayesian reasoner cross
the road?
A. You need more inIormation to answer this
question.
Here's what the original medical problem looks like when
graphed. % oI women have breast cancer, 8o% oI those women
test positive on a mammography, and j.6% oI women without
breast cancer also receive positive mammographies.
As is now clearly visible, the mammography doesn't increase the
probability a positivetesting woman has breast cancer by
increasing the number oI women with breast cancer oI course
not, iI mammography increased the number oI women with breast
cancer, no one would ever take the test! However, requiring a
positive mammography is a membership test that eliminates many
more women without breast cancer than women with cancer. The
number oI women without breast cancer diminishes by a Iactor oI
more than ten, Irom j,joo to j,o, while the number oI women
with breast cancer is diminished only Irom oo to 8o. Thus, the
proportion oI 8o within ,oo is much larger than the proportion
oI oo within o,ooo. In the graph, the leIt sector (representing
women with breast cancer) is small, but the mammography test
projects almost all oI this sector into the bottom bar. The right
sector (representing women without breast cancer) is large, but the
mammography test projects a much smaller Iraction oI this sector
into the bottom bar. There are, indeed, Iewer women with breast
cancer and positive mammographies than there are women with
breast cancer obeying the law oI probabilities which requires that
p(A) >= p(A&B). But even though the leIt sector in the
bottom bar is actually slightly smaller, the proportion oI the leIt
sector within the bottom bar is greater though still not very
great. II the bottom bar were renormalized to the same length as
the top bar, it would look like the leIt sector had expanded. This
is why the proportion oI women with breast cancer" in the group
women with positive mammographies" is higher than the
proportion oI women with breast cancer" in the general
population although the proportion is still not very high. The
evidence oI the positive mammography slides the prior probability
oI % to the posterior probability oI ;.8%.
Suppose there's yet another variant oI the mammography test,
mammographyC, which behaves as Iollows. % oI women in a
certain demographic have breast cancer. Like ordinary
mammography, mammographyC returns positive j.6% oI the time
Ior women without breast cancer. However, mammographyC
returns positive o% oI the time (say, once in a billion) Ior women
with breast cancer. The graph Ior this scenario looks like this:
\hat is it that this test actually does: II a patient comes to you
with a positive result on her mammographyC, what do you say:
Congratulations, you're among the rare j.,% oI the population
whose health is deIinitely established by this test."
MammographyC isn't a cancer test, it's a health test! Few women
without breast cancer get positive results on mammographyC, but
only women without breast cancer ever get positive results at all.
ot much oI the right sector oI the top bar projects into the
bottom bar, but none oI the leIt sector projects into the bottom
bar. So a positive result on mammographyC means you definitely
don't have breast cancer.
\hat makes ordinary mammography a positive indicator Ior breast
cancer is not that someone named the result positive", but rather
that the test result stands in a speciIic Bayesian relation to the
condition oI breast cancer. You could call the same result
positive" or negative" or blue" or red" or ]ames RutherIord",
or give it no name at all, and the test result would still slide the
probability in exactly the same way. To minimize conIusion, a test
result which slides the probability oI breast cancer upward should
be called positive". A test result which slides the probability oI
breast cancer downward should be called negative". II the test
result is statistically unrelated to the presence or absence oI breast
cancer iI the two conditional probabilities are equal then we
shouldn't call the procedure a cancer test"! The meaning oI the
test is determined by the two conditional probabilities, any names
attached to the results are simply convenient labels.
The bottom bar Ior the graph oI mammographyC is small,
mammographyC is a test that's only rarely useIul. Or rather, the
test only rarely gives strong evidence, and most oI the time gives
weak evidence. A negative result on mammographyC does slide
probability it just doesn't slide it very Iar. Click the Result"
switch at the bottom leIt corner oI the applet to see what a
negative result on mammographyC would imply. You might intuit
that since the test could have returned positive Ior health, but
didn't, then the Iailure oI the test to return positive must mean
that the woman has a higher chance oI having breast cancer that
her probability oI having breast cancer must be slid upward by the
negative result on her health test.
This intuition is correct! The sum oI the groups with negative
results and positive results must always equal the group oI all
women. II the positivetesting group has more than its Iair share"
oI women without breast cancer, there must be an at least slightly
higher proportion oI women with cancer in the negativetesting
group. A positive result is rare but very strong evidence in one
direction, while a negative result is common but very weak
evidence in the opposite direction. You might call this the Law oI
Conservation oI Probability not a standard term, but the
conservation rule is exact. II you take the revised probability oI
breast cancer aIter a positive result, times the probability oI a
positive result, and add that to the revised probability oI breast
cancer aIter a negative result, times the probability oI a negative
result, then you must always arrive at the prior probability. II you
don't yet know what the test result is, the expected revised probability
aIter the test result arrives taking both possible results into
account should always equal the prior probability.
On ordinary mammography, the test is expected to return
positive" o.% oI the time 8o positive women with cancer plus
j,o positive women without cancer equals oo women with
positive results. Conversely, the mammography should return
negative 8j.;% oI the time: oo% o.% 8j.;%. A positive
result slides the revised probability Irom % to ;.8%, while a
negative result slides the revised probability Irom % to o.zz%. So
p(cancer|positive)*p(positive) -
p(cancer|negative)*p(negative) = 7.8%*10.3% +
0.22%*89.7% = 1% = p(cancer), as expected.
\hy as expected": Let's take a look at the quantities involved:
p(cancer): 0.01
Croup : oo women
with breast cancer
p(~cancer): 0.99
Croup z: jjoo women
without breast cancer
p(positive|cancer): 80.0%
8o% oI women with
breast cancer have
positive mammographies
p(~positive|cancer): 20.0%
zo% oI women with
breast cancer have
negative
mammographies
p(positive|~cancer): 9.6%
j.6% oI women without
breast cancer have
p(~positive|~cancer): 90.4%
jo.% oI women
have negative
mammographies
p(cancer&positive): 0.008
Croup A: 8o women
with breast cancer and
p(cancer&~positive): 0.002
Croup B: zo women
with breast cancer and
negative
mammographies
p(~cancer&positive): 0.095
Croup C: j,o women
and positive
mammographies
p(~cancer&~positive): 0.895
Croup L: 8j,o women
and negative
mammographies
p(positive): 0.103
oo women with
positive results
p(~positive): 0.897
8j;o women with
negative results
p(cancer|positive): 7.80%
Chance you have breast
cancer iI mammography
is positive: ;.8%
p(~cancer|positive): 92.20%
Chance you are healthy
iI mammography is
positive: jz.z%
p(cancer|~positive): 0.22%
Chance you have breast
cancer iI mammography
is negative: o.zz%
p(~cancer|~positive): 99.78%
Chance you are healthy
iI mammography is
negative: jj.;8%
One oI the common conIusions in using Bayesian reasoning is to
mix up some or all oI these quantities which, as you can see, are
all numerically diIIerent and have diIIerent meanings. p(A&B) is
the same as p(B&A), but p(A|B) is not the same thing as
p(B|A), and p(A&B) is completely diIIerent Irom p(A|B). (I
don't know who chose the symmetrical | symbol to mean
implies", and then made the direction oI implication righttoleIt,
but it was probably a bad idea.)
To get acquainted with all these quantities and the relationships
between them, we'll play Iollow the degrees oI Ireedom". For
example, the two quantities p(cancer) and p(~cancer) have
degree oI Ireedom between them, because oI the general law p(A)
+ p(~A) = 1. II you know that p(~cancer) = .99, you can
obtain p(cancer) = 1 - p(~cancer) = .01. There's no
room to say that p(~cancer) = .99 and then also speciIy
p(cancer) = .25, it would violate the rule p(A) + p(~A) =
1.
p(positive|cancer) and p(~positive|cancer) also have
only one degree oI Ireedom between them, either a woman with
breast cancer gets a positive mammography or she doesn't. On the
other hand, p(positive|cancer) and
p(positive|~cancer) have two degrees oI Ireedom. You can
have a mammography test that returns positive Ior 8o% oI
cancerous patients and j.6% oI healthy patients, or that returns
positive Ior ;o% oI cancerous patients and z% oI healthy patients,
or even a health test that returns positive" Ior o% oI cancerous
patients and jz% oI healthy patients. The two quantities, the
output oI the mammography test Ior cancerous patients and the
output oI the mammography test Ior healthy patients, are in
mathematical terms independent, one cannot be obtained Irom
the other in any way, and so they have two degrees oI Ireedom
between them.
\hat about p(positive&cancer), p(positive|cancer),
and p(cancer): Here we have three quantities, how many
degrees oI Ireedom are there: In this case the equation that must
hold is p(positive&cancer) = p(positive|cancer) *
p(cancer). This equality reduces the degrees oI Ireedom by
one. II we know the Iraction oI patients with cancer, and chance
that a cancerous patient has a positive mammography, we can
deduce the Iraction oI patients who have breast cancer and a
positive mammography by multiplying. You should recognize this
operation Irom the graph, it's the projection oI the top bar into
the bottom bar. p(cancer) is the leIt sector oI the top bar, and
p(positive|cancer) determines how much oI that sector
projects into the bottom bar, and the leIt sector oI the bottom bar
is p(positive&cancer).
Similarly, iI we know the number oI patients with breast cancer
and positive mammographies, and also the number oI patients
with breast cancer, we can estimate the chance that a woman with
breast cancer gets a positive mammography by dividing:
p(positive|cancer) = p(positive&cancer) /
p(cancer). In Iact, this is exactly how such medical diagnostic
tests are calibrated, you do a study on 8,,zo women with breast
cancer and see that there are 6,86 (or thereabouts) women with
breast cancer andpositive mammographies, then divide 6,86 by
8,zo to Iind that 8o% oI women with breast cancer had positive
mammographies. (Incidentally, iI you accidentally divide 8,zo by
6,86 instead oI the other way around, your calculations will start
doing strange things, such as insisting that z,% oI women with
breast cancer and positive mammographies have breast cancer.
This is a common mistake in carrying out Bayesian arithmetic, in
my experience.) And Iinally, iI you know p(positive&cancer)
and p(positive|cancer), you can deduce how many cancer
patients there must have been originally. There are two degrees oI
Ireedom shared out among the three quantities, iI we know any
two, we can deduce the third.
How about p(positive), p(positive&cancer), and
p(positive&~cancer): Again there are only two degrees oI
Ireedom among these three variables. The equation occupying the
extra degree oI Ireedom is p(positive) =
p(positive&cancer) + p(positive&~cancer). This is
how p(positive) is computed to begin with, we Iigure out the
number oI women with breast cancer who have positive
mammographies, and the number oI women without breast cancer
who have positive mammographies, then add them together to get
the total number oI women with positive mammographies. It
would be very strange to go out and conduct a study to determine
the number oI women with positive mammographies just that
one number and nothing else but in theory you could do so. And
iI you then conducted another study and Iound the number oI
those women who had positive mammographies and breast cancer,
you would also know the number oI women with positive
mammographies and no breast cancer either a woman with a
positive mammography has breast cancer or she doesn't. In
general, p(A&B) + p(A&~B) = p(A). Symmetrically, p(A&B)
+ p(~A&B) = p(B).
\hat about p(positive&cancer),
p(positive&~cancer), p(~positive&cancer), and
p(~positive&~cancer): You might at Iirst be tempted to
think that there are only two degrees oI Ireedom Ior these Iour
quantities that you can, Ior example, get
p(positive&~cancer) by multiplying p(positive) *
p(~cancer), and thus that all Iour quantities can be Iound given
only the two quantities p(positive) and p(cancer). This is
not the case! p(positive&~cancer) = p(positive) *
p(~cancer) only iI the two probabilities are statistically
independent iI the chance that a woman has breast cancer has no
bearing on whether she has a positive mammography. As you'll
recall, this amounts to requiring that the two conditional
probabilities be equal to each other a requirement which would
eliminate one degree oI Ireedom. II you remember that these Iour
quantities are the groups A, B, C, and L, you can look over those
Iour groups and realize that, in theory, you can put any number oI
people into the Iour groups. II you start with a group oI 8o
women with breast cancer and positive mammographies, there's no
reason why you can't add another group oI ,oo women with breast
cancer and negative mammographies, Iollowed by a group oI
women without breast cancer and negative mammographies, and
so on. So now it seems like the Iour quantities have Iour degrees
oI Ireedom. And they would, except that in expressing them as
probabilities, we need to normalize them to fractions oI the
complete group, which adds the constraint that
p(positive&cancer) + p(positive&~cancer) +
p(~positive&cancer) + p(~positive&~cancer) = 1.
This equation takes up one degree oI Ireedom, leaving three
degrees oI Ireedom among the Iour quantities. II you speciIy the
fractions oI women in groups A, B, and L, you can deduce the
Iraction oI women in group C.
Civen the Iour groups A, B, C, and L, it is very straightIorward to
compute everything else: p(cancer) = A + B,
p(~positive|cancer) = B / (A + B), and so on. Since
ABCL contains three degrees oI Ireedom, it Iollows that the
entire set oI 6 probabilities contains only three degrees oI
Ireedom. Remember that in our problems we always needed three
pieces oI inIormation the prior probability and the two
conditional probabilities which, indeed, have three degrees oI
Ireedom among them. Actually, Ior Bayesian problems, any three
quantities with three degrees oI Ireedom between them should
logically speciIy the entire problem. For example, let's take a
barrel oI eggs with p(blue) = 0.40, p(blue|pearl) = 5/
13, and p(~blue&~pearl) = 0.20. Civen this inIormation,
you can compute p(pearl|blue).
As a story problem:
Suppose you have a large barrel containing a number oI plastic
eggs. Some eggs contain pearls, the rest contain nothing. Some
eggs are painted blue, the rest are painted red. Suppose that o%
oI the eggs are painted blue, ,/ oI the eggs containing pearls are
painted blue, and zo% oI the eggs are both empty and painted
red. \hat is the probability that an egg painted blue contains a
pearl:
Try it I assure you it is possible.
You probably shouldn't try to solve this with just a ]avascript
calculator, though. I used a Python console. (In theory, pencil
and paper should also work, but I don't know anyone who owns a
pencil so I couldn't try it personally.)
As a check on your calculations, does the (meaningless) quantity
p(~pearl|~blue)/p(pearl) roughly equal .,: (In story
problem terms: The likelihood that a red egg is empty, divided by
the likelihood that an egg contains a pearl, equals approximately
.,.) OI course, using this inIormation in the problem would be
cheating.
II you can solve that problem, then when we revisit Conservation
oI Probability, it seems perIectly straightIorward. OI course the
mean revised probability, aIter administering the test, must be the
same as the prior probability. OI course strong but rare evidence
in one direction must be counterbalanced by common but weak
evidence in the other direction.
Because:
p(cancer|positive)*p(positive)
+ p(cancer|~positive)*p(~positive)
= p(cancer)
In terms oI the Iour groups:
p(cancer|positive) = A / (A + C)
p(positive) = A + C
p(cancer&positive) = A
p(cancer|~positive) = B / (B + D)
p(~positive) = B + D
p(cancer&~positive) = B
p(cancer) = A + B
Let's return to the original barrel oI eggs o% oI the eggs
containing pearls, o% oI the pearl eggs painted blue, o% oI the
empty eggs painted blue. The graph Ior this problem is:
\hat happens to the revised probability, p(pearl|blue), iI the
proportion oI eggs containing pearls is kept constant, but 6o% oI
the eggs with pearls are painted blue (instead oI o%), and zo% oI
the empty eggs are painted blue (instead oI o%): You could type
6o% and zo% into the inputs Ior the two conditional
probabilities, and see how the graph changes but can you Iigure
out in advance what the change will look like:
II you guessed that the revised probability remains the same, because
the bottom bar grows by a Iactor oI z but retains the same
proportions, congratulations! Take a moment to think about how
Iar you've come. Looking at a problem like
% oI women have breast cancer. 8o% oI women with
breast cancer get positive mammographies. j.6% oI women
without breast cancer get positive mammographies. II a
woman has a positive mammography, what is the probability
she has breast cancer:
the vast majority oI respondents intuit that around ;o8o% oI
women with positive mammographies have breast cancer. ow,
looking at a problem like
Suppose there are two barrels containing many small plastic
eggs. In both barrels, some eggs are painted blue and the
rest are painted red. In both barrels, o% oI the eggs
contain pearls and the rest are empty. In the Iirst barrel,
o% oI the pearl eggs are painted blue, and o% oI the
empty eggs are painted blue. In the second barrel, 6o% oI
the pearl eggs are painted blue, and zo% oI the empty eggs
are painted blue. \ould you rather have a blue egg Irom
the Iirst or second barrel:
you can see it's intuitively obvious that the probability oI a blue egg
containing a pearl is the same Ior either barrel. Imagine how hard
it would be to see that using the old way oI thinking!
It's intuitively obvious, but how to prove it: Suppose that we call
P the prior probability that an egg contains a pearl, that we call M
the Iirst conditional probability (that a pearl egg is painted blue),
and the second conditional probability (that an empty egg is
painted blue). Suppose that M and are both increased or
diminished by an arbitrary Iactor X Ior example, in the problem
above, they are both increased by a Iactor oI z. Loes the revised
probability that an egg contains a pearl, given that we know the
egg is blue, stay the same:
- p(pearl) = P
- p(blue|pearl) = M*X
- p(blue|~pearl) = N*X
- p(pearl|blue) = ?
From these quantities, we get the Iour groups:
Croup A: p(pearl&blue) = P*M*X
Croup B: p(pearl&~blue) = P*(1 - (M*X))
Croup C: p(~pearl&blue) = (1 - P)*N*X
Croup L: p(~pearl&~blue) = (1 - P)*(1 -
(N*X))
The proportion oI eggs that contain pearls and are blue, within the
group oI all blue eggs, is then the proportion oI group (A) within
the group (A - C), equalling P*M*X / (P*M*X + (1 -
P)*N*X). The Iactor X in the numerator and denominator
cancels out, so increasing or diminishing both conditional
probabilities by a constant Iactor doesn't change the revised
probability.
Fun
Fact!
Q. Suppose that there are two barrels,
each containing a number of plastic
eggs. In both barrels, some eggs are
painted blue and the rest are painted
red. In the first barrel, 90% of the
eggs contain pearls and 20% of the
pearl eggs are painted blue. In the
second barrel, 45% of the eggs contain
pearls and 60% of the empty eggs are
painted red. Would you rather have a
blue pearl egg from the first or second
barrel?
A. Actually, it doesn't matter which barrel
you choose! Can you see why:
The probability that a test gives a true positive divided by the probability
that a test gives a false positive is known as the likelihood ratio oI that
test. Loes the likelihood ratio oI a medical test sum up everything
there is to know about the useIulness oI the test:
o, it does not! The likelihood ratio sums up everything there is
to know about the meaning oI a positive result on the medical test,
but the meaning oI a negative result on the test is not speciIied, nor
is the Irequency with which the test is useIul. II we examine the
algebra above, while p(pearl|blue) remains constant,
p(pearl|~blue) may change the X does not cancel out. As a
story problem, this strange Iact would look something like this:
Suppose that there are two barrels, each containing a
number oI plastic eggs. In both barrels, o% oI the eggs
contain pearls and the rest contain nothing. In both barrels,
some eggs are painted blue and the rest are painted red. In
the Iirst barrel, o% oI the eggs with pearls are painted blue,
and o% oI the empty eggs are painted blue. In the second
barrel, jo% oI the eggs with pearls are painted blue, and
o% oI the empty eggs are painted blue. \ould you rather
have a blue egg Irom the Iirst or second barrel: \ould you
rather have a red egg Irom the Iirst or second barrel:
For the Iirst question, the answer is that we don't care whether we
get the blue egg Irom the Iirst or second barrel. For the second
question, however, the probabilities do change in the Iirst barrel,
% oI the red eggs contain pearls, while in the second barrel 8.;%
oI the red eggs contain pearls! Thus, we should preIer to get a red
egg Irom the Iirst barrel. In the Iirst barrel, ;o% oI the pearl eggs
are painted red, and jo% oI the empty eggs are painted red. In
the second barrel, o% oI the pearl eggs are painted red, and ;o%
oI the empty eggs are painted red.
\hat goes on here: \e start out by noting that, counter to
intuition, p(pearl|blue) and p(pearl|~blue) have two
degrees oI Ireedom among them even when p(pearl) is Iixed
so there's no reason why one quantity shouldn't change while the
other remains constant. But we didn't we just get through
establishing a law Ior Conservation oI Probability", which says
that p(pearl|blue)*p(blue) +
p(pearl|~blue)*p(~blue) = p(pearl): Loesn't this
equation take up one degree oI Ireedom: o, because p(blue)
isn't Iixed between the two problems. In the second barrel, the
proportion oI blue eggs containing pearls is the same as in the Iirst
barrel, but a much larger Iraction oI eggs are painted blue! This
alters the set oI red eggs in such a way that the proportions do
change. Here's a graph Ior the red eggs in the second barrel:
Let's return to the example oI a medical test. The likelihood ratio
oI a medical test the number oI true positives divided by the
number oI Ialse positives tells us everything there is to know
about the meaning oI a positive result. But it doesn't tell us the
meaning oI a negative result, and it doesn't tell us how oIten the
test is useIul. For example, a mammography with a hit rate oI 8o%
Ior patients with breast cancer and a Ialse positive rate oI j.6% Ior
healthy patients has the same likelihood ratio as a test with an 8%
hit rate and a Ialse positive rate oI o.j6%. Although these two
tests have the same likelihood ratio, the Iirst test is more useIul in
every way it detects disease more oIten, and a negative result is
stronger evidence oI health.
The likelihood ratio Ior a positive result summarizes the
diIIerential pressure oI the two conditional probabilities Ior a
positive result, and thus summarizes how much a positive result
will slide the prior probability. Take a probability graph, like this
one:
The likelihood ratio oI the mammography is what determines the
slant oI the line. II the prior probability is %, then knowing only
the likelihood ratio is enough to determine the posterior
probability aIter a positive result.
But, as you can see Irom the Irequency graph, the likelihood ratio
doesn't tell the whole story in the Irequency graph, the proportions
oI the bottom bar can stay Iixed while the size oI the bottom bar
changes. p(blue) increases but p(pearl|blue) doesn't
change, because p(pearl&blue) and p(~pearl&blue)
increase by the same Iactor. But when you Ilip the graph to look at
p(~blue), the proportions oI p(pearl&~blue) and
p(~pearl&~blue) do not remain constant.
OI course the likelihood ratio cant tell the whole story, the
likelihood ratio and the prior probability together are only two
numbers, while the problem has three degrees oI Ireedom.
Suppose that you apply two tests Ior breast cancer in succession
say, a standard mammography and also some other test which is
independent oI mammography. Since I don't know oI any such
test which is independent oI mammography, I'll invent one Ior the
purpose oI this problem, and call it the TamsBraylor Livision
Test, which checks to see iI any cells are dividing more rapidly
than other cells. \e'll suppose that the TamsBraylor gives a true
positive Ior jo% oI patients with breast cancer, and gives a Ialse
positive Ior ,% oI patients without cancer. Let's say the prior
prevalence oI breast cancer is %. II a patient gets a positive result
on her mammography and her TamsBraylor, what is the revised
probability she has breast cancer:
One way to solve this problem would be to take the revised
probability Ior a positive mammography, which we already
calculated as ;.8%, and plug that into the TamsBraylor test as the
new prior probability. II we do this, we Iind that the result comes
out to 6o%.
But this assumes that Iirst we see the positive mammography
result, and then the positive result on the TamsBraylor. \hat iI
Iirst the woman gets a positive result on the TamsBraylor,
Iollowed by a positive result on her mammography. Intuitively, it
seems like it shouldn't matter. Loes the math check out:
First we'll administer the TamsBraylor to a woman with a %
prior probability oI breast cancer.
Then we administer a mammography, which gives 8o% true
positives and j.6% Ialse positives, and it also comes out positive.
Lo and behold, the answer is again 6o%. (II it's not exactly the
same, it's due to rounding error you can get a more precise
calculator, or work out the Iractions by hand, and the numbers will
be exactly equal.)
An algebraic prooI that both strategies are equivalent is leIt to the
reader. To visualize, imagine that the lower bar oI the Irequency
applet Ior mammography projects an even lower bar using the
probabilities oI the TamsBraylor Test, and that the Iinal lowest
bar is the same regardless oI the order in which the conditional
probabilities are projected.
\e might also reason that since the two tests are independent, the
probability a woman with breast cancer gets a positive
mammography and a positive TamsBraylor is jo% * 8o% ;z%.
And the probability that a woman without breast cancer gets Ialse
positives on mammography and TamsBraylor is ,% * j.6%
o.8%. So iI we wrap it all up as a single test with a likelihood
ratio oI ;z%/o.8%, and apply it to a woman with a % prior
probability oI breast cancer:
.we Iind once again that the answer is 6o%.
Suppose that the prior prevalence oI breast cancer in a
demographic is %. Suppose that we, as doctors, have a repertoire
oI three independent tests Ior breast cancer. Our Iirst test, test A,
a mammography, has a likelihood ratio oI 8o%/j.6% 8.. The
second test, test B, has a likelihood ratio oI 8.o (Ior example, Irom
jo% versus ,%), and the third test, test C, has a likelihood ratio oI
., (which could be Irom ;o% versus zo%, or Irom ,% versus
o%, it makes no diIIerence). Suppose a patient gets a positive
result on all three tests. \hat is the probability the patient has
breast cancer:
Here's a Iun trick Ior simpliIying the bookkeeping. II the prior
prevalence oI breast cancer in a demographic is %, then out oI
oo women have breast cancer, and jj out oI oo women do not
have breast cancer. So iI we rewrite the probability oI % as an odds
ratio, the odds are:
1:99
And the likelihood ratios oI the three tests A, B, and C are:
8.33:1 = 25:3
18.0:1 = 18:1
3.5:1 = 7:2
The odds Ior women with breast cancer who score positive on all
three tests, versus women without breast cancer who score
positive on all three tests, will equal:
1*25*18*7:99*3*1*2 =
3,150:594
To recover the probability Irom the odds, we just write:
3,150 / (3,150 + 594) = 84%
This always works regardless oI how the odds ratios are written,
i.e., 8.: is just the same as z,: or ;,:j. It doesn't matter in what
order the tests are administered, or in what order the results are
computed. The prooI is leIt as an exercise Ior the reader.
L. T. ]aynes, in Probability Theory \ith Applications in Science
and Lngineering", suggests that credibility and evidence should be
measured in decibels.
Lecibels:
Lecibels are used Ior measuring exponential diIIerences oI
intensity. For example, iI the sound Irom an automobile horn
carries o,ooo times as much energy (per square meter per second)
as the sound Irom an alarm clock, the automobile horn would be
o decibels louder. The sound oI a bird singing might carry ,ooo
times less energy than an alarm clock, and hence would be o
decibels soIter. To get the number oI decibels, you take the
logarithm base o and multiply by o.
decibels = 10 log
10
(intensity)
or
intensity = 10
(decibels/10)
Suppose we start with a prior probability oI % that a woman has
breast cancer, corresponding to an odds ratio oI :jj. And then we
administer three tests oI likelihood ratios z,:, 8:, and ;:z. You
could multiply those numbers. or you could just add their
logarithms:
10 log
10
(1/99) = -20
10 log
10
(25/3) = 9
10 log
10
(18/1) = 13
10 log
10
(7/2) = 5
It starts out as Iairly unlikely that a woman has breast cancer our
credibility level is at zo decibels. Then three test results come in,
corresponding to j, , and , decibels oI evidence. This raises the
credibility level by a total oI z; decibels, meaning that the prior
credibility oI zo decibels goes to a posterior credibility oI ;
decibels. So the odds go Irom :jj to ,:, and the probability goes
Irom % to around 8%.
In Iront oI you is a bookbag containing ,ooo poker chips. I
started out with two such bookbags, one containing ;oo red
and oo blue chips, the other containing oo red and ;oo
blue. I Ilipped a Iair coin to determine which bookbag to
use, so your prior probability that the bookbag in Iront oI
you is the red bookbag is ,o%. ow, you sample randomly,
with replacement aIter each chip. In z samples, you get 8
reds and blues. \hat is the probability that this is the
predominantly red bag:
]ust Ior Iun, try and work this one out in your head. You don't
need to be exact a rough estimate is good enough. \hen you're
ready, continue onward.
According to a study perIormed by Lawrence Phillips and \ard
Ldwards in j66, most people, Iaced with this problem, give an
answer in the range ;o% to 8o%. Lid you give a substantially
higher probability than that: II you did, congratulations \ard
Ldwards wrote that very seldom does a person answer this
question properly, even iI the person is relatively Iamiliar with
Bayesian reasoning. The correct answer is j;%.
The likelihood ratio Ior the test result red chip" is ;/, while the
likelihood ratio Ior the test result blue chip" is /;. ThereIore a
blue chip is exactly the same amount oI evidence as a red chip, just
in the other direction a red chip is .6 decibels oI evidence Ior the
red bag, and a blue chip is .6 decibels oI evidence. II you draw
one blue chip and one red chip, they cancel out. So the ratio oI red
chips to blue chips does not matter, only the excess oI red chips
over blue chips matters. There were eight red chips and Iour blue
chips in twelve samples, thereIore, Iour more red chips than blue
chips. Thus the posterior odds will be:
7
4
:3
4
= 2401:81
which is around o:, i.e., around j;%.
The prior credibility starts at o decibels and there's a total oI
around decibels oI evidence, and indeed this corresponds to
odds oI around z,: or around j6%. Again, there's some rounding
error, but iI you perIormed the operations using exact arithmetic,
the results would be identical.
\e can now see intuitively that the bookbag problem would have
exactly the same answer, obtained in just the same way, iI sixteen
chips were sampled and we Iound ten red chips and six blue chips.
You are a mechanic Ior gizmos. \hen a gizmo stops
working, it is due to a blocked hose o% oI the time. II a
gizmo's hose is blocked, there is a ,% probability that
prodding the gizmo will produce sparks. II a gizmo's hose is
unblocked, there is only a ,% chance that prodding the
gizmo will produce sparks. A customer brings you a
malIunctioning gizmo. You prod the gizmo and Iind that it
produces sparks. \hat is the probability that a spark
producing gizmo has a blocked hose:
\hat is the sequence oI arithmetical operations that you
perIormed to solve this problem:
(,%*o%) / (,%*o% - ,%*;o%)
Similarly, to Iind the chance that a woman with positive
mammography has breast cancer, we computed:
p(positive|cancer)*p(cancer)
_______________________________________________
p(positive|cancer)*p(cancer) +
p(positive|~cancer)*p(~cancer)
which is
p(positive&cancer) / [p(positive&cancer) +
p(positive&~cancer)]
which is
p(positive&cancer) / p(positive)
which is
p(cancer|positive)
The Iully general Iorm oI this calculation is known as Bayes
Theorem or Bayes Rule:
Bayes' Theorem:
p(A|X) =
p(X|A)*p(A)
p(X|A)*p(A) + p(X|~A)*p(~A)
Civen some phenomenon A that we want to investigate, and an
observation X that is evidence about A Ior example, in the
previous example, A is breast cancer and X is a positive
mammography Bayes' Theorem tells us how we should update our
probability oI A, given the new evidence X.
By this point, Bayes' Theorem may seem blatantly obvious or even
tautological, rather than exciting and new. II so, this introduction
has entirely succeeded in its purpose.
Fun
Fact!
Q. Who originally discovered Bayes
Theorem?
A. The Reverend Thomas Bayes, by Iar the
most enigmatic Iigure in mathematical
history. Almost nothing is known oI
Bayes's liIe, and very Iew oI his manuscripts
survived. Thomas Bayes was born in ;o
or ;oz to ]oshua Bayes and Ann Carpenter,
and his date oI death is listed as ;6. The
exact date oI Thomas Bayes's birth is not
known Ior certain because ]oshua Bayes,
though a surprisingly wealthy man, was a
member oI an unusual, esoteric, and even
heretical religious sect, the
onconIormists". The onconIormists
kept their birth registers secret, supposedly
Irom Iear oI religious discrimination,
whatever the reason, no true record exists
oI Thomas Bayes's birth. Thomas Bayes
was raised a onconIormist and was soon
promoted into the higher ranks oI the
onconIormist theosophers, whence comes
the Reverend" in his name.
In ;z Bayes was elected a Fellow oI the
Royal Society oI London, the most
prestigious scientiIic body oI its day,
despite Bayes having published no scientiIic
or mathematical works at that time.
Bayes's nomination certiIicate was signed
by sponsors including the President and the
Secretary oI the Society, making his
election almost certain. Lven today,
however, it remains a mystery why such
weighty names sponsored an unknown into
the Royal Society.
Bayes's sole publication during his known
liIetime was allegedly a mystical book
entitled Divine Benevolence, laying Iorth the
original causation and ultimate purpose oI
the universe. The book is commonly
attributed to Bayes, though it is said that
no author appeared on the title page, and
the entire work is sometimes considered to
be oI dubious provenance.
Most mysterious oI all, Bayes' Theorem
itselI appears in a Bayes manuscript
presented to the Royal Society oI London
in ;6, three years after Bayess supposed death
in 1761!
Lespite the shocking circumstances oI its
presentation, Bayes' Theorem was soon
Iorgotten, and was popularized within the
scientiIic community only by the later
eIIorts oI the great mathematician Pierre
Simon Laplace. Laplace himselI is almost as
enigmatic as Bayes, we don't even know
whether it was Pierre" or Simon" that was
his actual Iirst name. Laplace's papers are
said to have contained a design Ior an AI
capable oI predicting all Iuture events, the
socalled Laplacian superintelligence".
\hile it is generally believed that Laplace
never tried to implement his design, there
remains the Iact that Laplace presciently
Iled the guillotine that claimed many oI his
colleagues during the Reign oI Terror.
Lven today, physicists sometimes attribute
unusual eIIects to a Laplacian Operator"
intervening in their experiments.
In summary, we do not know the real
circumstances oI Bayes's birth, the ultimate
origins oI Bayes' Theorem, Bayes's actual
year oI death, or even whether Bayes ever
really died. onetheless Reverend
Thomas Bayes", whatever his true identity,
has the greatest Iondness and gratitude oI
Larth's scientiIic community.
So why is it that some people are so excited about Bayes' Theorem:
Lo you believe that a nuclear war will occur in the next zo years:
II no, why not:" Since I wanted to use some common answers to
this question to make a point about rationality, I went ahead and
asked the above question in an IRC channel, =philosophy on
LFet.
One LFetter who answered replied o" to the above question,
but added that he believed biological warIare would wipe out
jj.%" oI humanity within the next ten years. I then asked
whether he believed oo% was a possibility. "o," he said. "\hy
not:", I asked. "Because I'm an optimist," he said. (Roanoke oI
=philosophy on LFet wishes to be credited with this statement,
even having been warned that it will not be cast in a
complimentary light. Cood Ior him!) Another person who
answered the above question said that he didn't expect a nuclear
war Ior oo years, because All oI the players involved in decisions
regarding nuclear war are not interested right now." "But why
extend that out Ior oo years:", I asked. "Pure hope," was his
reply.
\hat is it exactly that makes these thoughts irrational" a poor
way oI arriving at truth: There are a number oI intuitive replies
that can be given to this, Ior example: It is not rational to believe
things only because they are comIorting." OI course it is equally
irrational to believe things only because they are discomforting; the
second error is less common, but equally irrational. Other
intuitive arguments include the idea that \hether or not you
happen to be an optimist has nothing to do with whether
biological warIare wipes out the human species", or Pure hope is
not evidence about nuclear war because it is not an observation
about nuclear war."
There is also a mathematical reply that is precise, exact, and
contains all the intuitions as special cases. This mathematical
reply is known as Bayes' Theorem.
For example, the reply \hether or not you happen to be an
optimist has nothing to do with whether biological warIare wipes
out the human species" can be translated into the statement:
p(you are currently an optimist ' biological war occurs within ten
years and wipes out humanity)
p(you are currently an optimist ' biological war occurs within ten
years and does not wipe out humanity)
Since the two probabilities Ior p(X|A) and p(X|~A) are equal,
Bayes' Theorem says that p(A|X) = p(A), as we have earlier
seen, when the two conditional probabilities are equal, the revised
probability equals the prior probability. II X and A are
unconnected statistically independent then Iinding that X is
true cannot be evidence that A is true, observing X does not
update our probability Ior A, saying X" is not an argument Ior A.
But suppose you are arguing with someone who is verbally clever
and who says something like, Ah, but since I'm an optimist, I'll
have renewed hope Ior tomorrow, work a little harder at my dead
end job, pump up the global economy a little, eventually, through
the trickledown eIIect, sending a Iew dollars into the pocket oI
the researcher who ultimately Iinds a way to stop biological
warIare so you see, the two events are related aIter all, and I can
use one as valid evidence about the other." In one sense, this is
correct any correlation, no matter how weak, is Iair prey Ior
Bayes' Theorem, but Bayes' Theorem distinguishes between weak
and strong evidence. That is, Bayes' Theorem not only tells us
what is and isn't evidence, it also describes the strength oI
evidence. Bayes' Theorem not only tells us when to revise our
probabilities, but how much to revise our probabilities. A
correlation between hope and biological warIare may exist, but it's
a lot weaker than the speaker wants it to be, he is revising his
probabilities much too Iar.
Let's say you're a woman who's just undergone a mammography.
Previously, you Iigured that you had a very small chance oI having
breast cancer, we'll suppose that you read the statistics somewhere
and so you know the chance is %. \hen the positive
mammography comes in, your estimated chance should now shiIt
to ;.8%. There is no room to say something like, Oh, well, a
positive mammography isn't deIinite evidence, some healthy
women get positive mammographies too. I don't want to despair
too early, and I'm not going to revise my probability until more
evidence comes in. \hy: Because I'm a optimist." And there is
similarly no room Ior saying, \ell, a positive mammography may
not be deIinite evidence, but I'm going to assume the worst until I
Iind otherwise. \hy: Because I'm a pessimist." Your revised
probability should go to ;.8%, no more, no less.
Bayes' Theorem describes what makes something evidence" and
how much evidence it is. Statistical models are judged by
comparison to the Bayesian method because, in statistics, the
Bayesian method is as good as it gets the Bayesian method
deIines the maximum amount oI mileage you can get out oI a given
piece oI evidence, in the same way that thermodynamics deIines
the maximum amount oI work you can get out oI a temperature
diIIerential. This is why you hear cognitive scientists talking about
Bayesian reasoners. In cognitive science, Bayesian reasoner is the
technically precise codeword that we use to mean rational mind.
There are also a number oI general heuristics about human
reasoning that you can learn Irom looking at Bayes' Theorem.
For example, in many discussions oI Bayes' Theorem, you may hear
cognitive psychologists saying that people do not take prior
frequencies sufficiently into account, meaning that when people
approach a problem where there's some evidence X indicating that
condition A might hold true, they tend to judge A's likelihood
solely by how well the evidence X seems to match A, without
taking into account the prior Irequency oI A. II you think, Ior
example, that under the mammography example, the woman's
chance oI having breast cancer is in the range oI ;o%8o%, then
this kind oI reasoning is insensitive to the prior Irequency given in
the problem, it doesn't notice whether % oI women or o% oI
women start out having breast cancer. Pay more attention to the
prior Irequency!" is one oI the many things that humans need to
bear in mind to partially compensate Ior our builtin inadequacies.
A related error is to pay too much attention to p(X'A) and not
enough to p(X'-A) when determining how much evidence X is Ior
A. The degree to which a result X is evidence for A depends, not
only on the strength oI the statement wed expect to see result X if A
were true, but also on the strength oI the statement we wouldnt
expect to see result X if A werent true. For example, iI it is raining,
this very strongly implies the grass is wet - p(wetgrass|rain)
~ 1 but seeing that the grass is wet doesn't necessarily mean
that it has just rained, perhaps the sprinkler was turned on, or
you're looking at the early morning dew. Since
p(wetgrass|~rain) is substantially greater than zero,
p(rain|wetgrass) is substantially less than one. On the other
hand, iI the grass was never wet when it wasn't raining, then
knowing that the grass was wet would always show that it was
raining, p(rain|wetgrass) ~ 1, even iI
p(wetgrass|rain) = 50%, that is, even iI the grass only got
wet ,o% oI the times it rained. Lvidence is always the result oI the
differential between the two conditional probabilities. Strong
evidence is not the product oI a very high probability that A leads
to X, but the product oI a very low probability that not-A could
have led to X.
The Bayesian revolution in the sciences is Iueled, not only by more and
more cognitive scientists suddenly noticing that mental
phenomena have Bayesian structure in them, not only by scientists
in every Iield learning to judge their statistical methods by
comparison with the Bayesian method, but also by the idea that
science itself is a special case of Bayes Theorem; experimental evidence is
Bayesian evidence. The Bayesian revolutionaries hold that when you
perIorm an experiment and get evidence that conIirms" or
disconIirms" your theory, this conIirmation and disconIirmation
is governed by the Bayesian rules. For example, you have to take
into account, not only whether your theory predicts the
phenomenon, but whether other possible explanations also predict
the phenomenon. Previously, the most popular philosophy oI
science was probably Karl Popper's falsificationism this is the old
philosophy that the Bayesian revolution is currently dethroning.
Karl Popper's idea that theories can be deIinitely IalsiIied, but
never deIinitely conIirmed, is yet another special case oI the
Bayesian rules, iI p(X|A) ~ 1 iI the theory makes a deIinite
prediction then observing -X very strongly IalsiIies A. On the
other hand, iI p(X|A) ~ 1, and we observe X, this doesn't
deIinitely conIirm the theory, there might be some other
condition B such that p(X|B) ~ 1, in which case observing X
doesn't Iavor A over B. For observing X to deIinitely conIirm A,
we would have to know, not that p(X|A) ~ 1, but that
p(X|~A) ~ 0, which is something that we can't know because
we can't range over all possible alternative explanations. For
example, when Linstein's theory oI Ceneral Relativity toppled
ewton's incredibly wellconIirmed theory oI gravity, it turned out
that all oI ewton's predictions were just a special case oI
Linstein's predictions.
You can even Iormalize Popper's philosophy mathematically. The
likelihood ratio Ior X, p(X|A)/p(X|~A), determines how much
observing X slides the probability Ior A, the likelihood ratio is
what says how strong X is as evidence. \ell, in your theory A, you
can predict X with probability , iI you like, but you can't control
the denominator oI the likelihood ratio, p(X|~A) there will
always be some alternative theories that also predict X, and while
we go with the simplest theory that Iits the current evidence, you
may someday encounter some evidence that an alternative theory
predicts but your theory does not. That's the hidden gotcha that
toppled ewton's theory oI gravity. So there's a limit on how
much mileage you can get Irom successIul predictions, there's a
limit on how high the likelihood ratio goes Ior confirmatory
evidence.
On the other hand, iI you encounter some piece oI evidence Y
that is deIinitely not predicted by your theory, this is enormously
strong evidence against your theory. II p(Y|A) is inIinitesimal,
then the likelihood ratio will also be inIinitesimal. For example, iI
p(Y|A) is o.ooo%, and p(Y|~A) is %, then the likelihood
ratio p(Y|A)/p(Y|~A) will be :oooo. o decibels oI
evidence! Or Ilipping the likelihood ratio, iI p(Y|A) is very small,
then p(Y|~A)/p(Y|A) will be very large, meaning that observing
Y greatly Iavors -A over A. FalsiIication is much stronger than
conIirmation. This is a consequence oI the earlier point that very
strong evidence is not the product oI a very high probability that A
leads to X, but the product oI a very low probability that not-A
could have led to X. This is the precise Bayesian rule that
underlies the heuristic value oI Popper's IalsiIicationism.
Similarly, Popper's dictum that an idea must be IalsiIiable can be
interpreted as a maniIestation oI the Bayesian conservationoI
probability rule, iI a result X is positive evidence Ior the theory,
then the result -X would have disconIirmed the theory to some
extent. II you try to interpret both X and -X as conIirming" the
theory, the Bayesian rules say this is impossible! To increase the
probability oI a theory you must expose it to tests that can
potentially decrease its probability, this is not just a rule Ior
detecting wouldbe cheaters in the social process oI science, but a
consequence oI Bayesian probability theory. On the other hand,
Popper's idea that there is only IalsiIication and no such thing as
conIirmation turns out to be incorrect. Bayes' Theorem shows
that IalsiIication is very strong evidence compared to conIirmation,
but IalsiIication is still probabilistic in nature, it is not governed by
Iundamentally diIIerent rules Irom conIirmation, as Popper
argued.
So we Iind that many phenomena in the cognitive sciences, plus
the statistical methods used by scientists, plus the scientiIic
method itselI, are all turning out to be special cases oI Bayes'
Theorem. Hence the Bayesian revolution.
Fun
Fact!
Q. Are there any limits to the power
of Bayes Theorem?
A. According to legend, one who Iully
grasped Bayes' Theorem would gain the
ability to create and physically enter an
alternate universe using only oIItheshelI
equipment and a short computer program.
One who Iully grasps Bayes' Theorem, yet
remains in our universe to aid others, is
known as a Bayesattva.
Bayes' Theorem:
p(A|X) =
p(X|A)*p(A)
p(X|A)*p(A) + p(X|~A)*p(~A)
\hy wait so long to introduce Bayes' Theorem, instead oI just
showing it at the beginning: \ell. because I've tried that beIore,
and what happens, in my experience, is that people get all tangled
up in trying to apply Bayes' Theorem as a set oI poorly grounded
mental rules; instead oI the Theorem helping, it becomes one more
thing to juggle mentally, so that in addition to trying to remember
how many women with breast cancer have positive
mammographies, the reader is also trying to remember whether it's
p(X|A) in the numerator or p(A|X), and whether a positive
mammography result corresponds to A or X, and which side oI
p(X|A) is the implication, and what the terms are in the
denominator, and so on. In this excruciatingly gentle
introduction, I tried to show all the workings oI Bayesian
reasoning without ever introducing the explicit Theorem as
something extra to memorize, hopeIully reducing the number oI
Iactors the reader needed to mentally juggle.
Lven iI you happen to be one oI the Iortunate people who can
easily grasp and apply abstract theorems, the mentaljuggling
problem is still something to bear in mind iI you ever need to
explain Bayesian reasoning to someone else.
II you do Iind yourselI losing track, my advice is to Iorget Bayes'
Theorem as an equation and think about the graph. p(A) and p(-A)
are at the top. p(X'A) and p(X'-A) are the projection Iactors.
p(X&A) and p(X&-A) are at the bottom. And p(A'X) equals the
proportion oI p(X&A) within p(X&A)-p(X&-A). The graph isn't
shown here but can you see it in your mind:
And iI thinking about the graph doesn't work, I suggest Iorgetting
about Bayes' Theorem entirely just try to work out the speciIic
problem in gizmos, hoses, and sparks, or whatever it is.
Having introduced Bayes' Theorem explicitly, we can explicitly
discuss its components.
p(A|X) =
p(X|A)*p(A)
p(X|A)*p(A) + p(X|~A)*p(~A)
\e'll start with p(A'X). II you ever Iind yourselI getting conIused
about what's A and what's X in Bayes' Theorem, start with p(A'X)
on the leIt side oI the equation, that's the simplest part to
interpret. A is the thing we want to know about. X is how we're
observing it, X is the evidence we're using to make inIerences
about A. Remember that Ior every expression p(Q'P), we want to
know about the probability Ior Q given P, the degree to which P
implies Q a more sensible notation, which it is now too late to
adopt, would be p(Q<-P).
p(Q'P) is closely related to p(Q&P), but they are not identical.
Lxpressed as a probability or a Iraction, p(Q&P) is the proportion
oI things that have property Q and property P within all things; i.e.,
the proportion oI women with breast cancer and a positive
mammography" within the group oI all women. II the total number
oI women is o,ooo, and 8o women have breast cancer and a
positive mammography, then p(Q&P) is 8o/o,ooo o.8%. You
might say that the absolute quantity, 8o, is being normalized to a
probability relative to the group of all women. Or to make it clearer,
suppose that there's a group oI 6 women with breast cancer and
a positive mammography within a total sample group oI 8j,o
women. 6 is the absolute quantity. II you pick out a random
woman Irom the entire sample, then the probability you'll pick a
woman with breast cancer and a positive mammography is
p(Q&P), or o.;z% (in this example).
On the other hand, p(Q'P) is the proportion oI things that have
property Q and property P within all things that have P; i.e., the
proportion oI women with breast cancer and a positive
mammography within the group oI all women with positive
mammographies. II there are 6 women with breast cancer and
positive mammographies, ;j, women with positive
mammographies, and 8j,o women, then p(Q&P) is the
probability oI getting one oI those 6 women iI you're picking at
random Irom the entire group oI 8j,o, while p(Q'P) is the
probability oI getting one oI those 6 women iI you're picking at
random Irom the smaller group oI ;j,.
In a sense, p(Q|P)really means p(Q&P|P), but speciIying the
extra P all the time would be redundant. You already know it has
property P, so the property you're investigating is Q even though
you're looking at the size oI group Q&P within group P, not the
size oI group Q within group P (which would be nonsense). This is
what it means to take the property on the righthand side as given;
it means you know you're working only within the group oI things
that have property P. \hen you constrict your Iocus oI attention
to see only this smaller group, many other probabilities change. II
you're taking P as given, then p(Q&P) equals just p(Q) at least,
relative to the group P. The old p(Q), the Irequency oI things that
have property Q within the entire sample", is revised to the new
Irequency oI things that have property Q within the subsample oI
things that have property P". II P is given, iI P is our entire world,
then looking Ior Q&P is the same as looking Ior just Q.
II you constrict your Iocus oI attention to only the population oI
eggs that are painted blue, then suddenly the probability that an
egg contains a pearl" becomes a diIIerent number, this proportion
is diIIerent Ior the population oI blue eggs than the population oI
all eggs. The given, the property that constricts our Iocus oI
attention, is always on the right side oI p(Q'P), the P becomes our
world, the entire thing we see, and on the other side oI the given"
P always has probability that is what it means to take P as
given. So p(Q'P) means II P has probability , what is the
probability oI Q:" or II we constrict our attention to only things
or events where P is true, what is the probability oI Q:" Q, on the
other side oI the given, is not certain its probability may be o%
or jo% or any other number. So when you use Bayes' Theorem,
and you write the part on the leIt side as p(A'X) how to update the
probability oI A aIter seeing X, the new probability oI A given that
we know X, the degree to which X implies A you can tell that X is
always the observation or the evidence, and A is the property being
investigated, the thing you want to know about.
The right side oI Bayes' Theorem is derived Irom the leIt side
through these steps:
p(A|X) = p(A|X)
p(A|X) =
p(X&A)
p(X)
p(A|X) =
p(X&A)
p(X&A) + p(X&~A)
p(A|X) =
p(X|A)*p(A)
p(X|A)*p(A) + p(X|~A)*p(~A)
The Iirst step, p(A|X) to p(X&A)/p(X), may look like a
tautology. The actual math perIormed is diIIerent, though.
p(A|X) is a single number, the normalized probability or
Irequency oI A within the subgroup X. p(X&A)/p(X) are usually
the percentage Irequencies oI X&A and X within the entire
sample, but the calculation also works iI X&A and X are absolute
numbers oI people, events, or things. p(cancer|positive) is
a single percentage/Irequency/probability, always between o and .
(positive&cancer)/(positive) can be measured either in
probabilities, such as o.oo8/o.o, or it might be expressed in
groups oI women, Ior example j/zj. As long as both the
numerator and denominator are measured in the same units, it
should make no diIIerence.
Coing Irom p(X) in the denominator to p(X&A)+p(X&~A) is a
very straightIorward step whose main purpose is as a stepping
stone to the last equation. However, one common arithmetical
mistake in Bayesian calculations is to divide p(X&A) by p(X&~A),
instead oI dividing p(X&A) by [p(X&A) + p(X&~A)]. For
example, someone doing the breast cancer calculation tries to get
the posterior probability by perIorming the math operation 8o /
j,o, instead oI 8o / (8o - j,o). I like to think oI this as a rose
Ilowers error. Sometimes iI you show young children a picture
with eight roses and two tulips, they'll say that the picture contains
more roses than Ilowers. (Technically, this would be called a class
inclusion error.) You have to add the roses and the tulips to get
the number oI flowers, which you need to Iind the proportion oI
roses within the Ilowers. You can't Iind the proportion oI roses in
the tulips, or the proportion oI tulips in the roses. \hen you look
at the graph, the bottom bar consists oI all the patients with
positive results. That's what the doctor sees a patient with a
positive result. The question then becomes whether this is a
healthy patient with a positive result, or a cancerous patient with a
positive result. To Iigure the odds oI that, you have to look at the
proportion oI cancerous patients with positive results within all
patients who have positive results, because again, a patient with a
positive result" is what you actually see. You can't divide 8o by
j,o because that would mean you were trying to Iind the
proportion oI cancerous patients with positive results within the
group oI healthy patients with positive results, it's like asking how
many oI the tulips are roses, instead oI asking how many oI the
Ilowers are roses. Imagine using the same method to Iind the
proportion oI healthy patients. You would divide j,o by 8o and
Iind that ,8;% oI the patients were healthy. Or to be exact, you
would Iind that ,8;% oI cancerous patients with positive results
were healthy patients with positive results.
The last step in deriving Bayes' Theorem is going Irom p(X&A) to
p(X|A)*p(A), in both the numerator and the denominator, and
Irom p(X&~A) to p(X|~A)*p(~A), in the denominator.
\hy: \ell, one answer is because p(X'A), p(X'-A), and p(A)
correspond to the initial inIormation given in all the story
problems. But why were the story problems written that way:
Because in many cases, p(X'A), p(X'-A), and p(A) are what we
actually know; and this in turn happens because p(X'A) and p(X'-A)
are oIten the quantities that directly describe causal relations, with
the other quantities derived Irom them and p(A) as statistical
relations. For example, p(X'A), the implication Irom A to X, where
A is what we want to know and X is our way oI observing it,
corresponds to the implication Irom a woman having breast cancer
to a positive mammography. This is not just a statistical implication
but a direct causal relation; a woman gets a positive mammography
because she has breast cancer. The mammography is designed to
detect breast cancer, and it is a Iact about the physical process oI
the mammography exam that it has an 8o% probability oI
detecting breast cancer. As long as the design oI the
mammography machine stays constant, p(X'A) will stay at 8o%,
even iI p(A) changes Ior example, iI we screen a group oI woman
with other risk Iactors, so that the prior Irequency oI women with
breast cancer is o% instead oI %. In this case, p(X&A) will
change along with p(A), and so will p(X), p(A'X), and so on, but
p(X'A) stays at 8o%, because that's a Iact about the mammography
exam itselI. (Though you do need to test this statement beIore
relying on it, it's possible that the mammography exam might work
better on some Iorms oI breast cancer than others.) p(X'A) is one
oI the simple Iacts Irom which complex Iacts like p(X&A) are
constructed, p(X'A) is an elementary causal relation within a
complex system, and it has a direct physical interpretation. This is
why Bayes' Theorem has the Iorm it does, it's not Ior solving math
brainteasers, but Ior reasoning about the physical universe.
Once the derivation is Iinished, all the implications on the right
side oI the equation are oI the Iorm p(X|A) or p(X|~A), while
the implication on the leIt side is p(A|X). As long as you
remember this and you get the rest oI the equation right, it
shouldn't matter whether you happened to start out with p(A'X) or
p(X'A) on the leIt side oI the equation, as long as the rules are
applied consistently iI you started out with the direction oI
implication p(X'A) on the leIt side oI the equation, you would need
to end up with the direction p(A'X) on the right side oI the
equation. This, oI course, is just changing the variable labels, the
point is to remember the symmetry, in order to remember the
structure oI Bayes' Theorem.
The symmetry arises because the elementary causal relations are
generally implications Irom Iacts to observations, i.e., Irom breast
cancer to positive mammography. The elementary steps in reasoning
are generally implications Irom observations to Iacts, i.e., Irom a
positive mammography to breast cancer. The leIt side oI Bayes'
Theorem is an elementary inferential step Irom the observation oI
positive mammography to the conclusion oI an increased
probability oI breast cancer. Implication is written righttoleIt, so
we write p(cancer|positive) on the leIt side oI the
equation. The right side oI Bayes' Theorem describes the
elementary causal steps Ior example, Irom breast cancer to a
positive mammography and so the implications on the right side
oI Bayes' Theorem take the Iorm p(positive|cancer) or
p(positive|~cancer).
And that's Bayes' Theorem. Rational inIerence on the leIt end,
physical causality on the right end, an equation with mind on one
side and reality on the other. Remember how the scientiIic
method turned out to be a special case oI Bayes' Theorem: II you
wanted to put it poetically, you could say that Bayes' Theorem
binds reasoning into the physical universe.
Okay, we're done.
Reverend Bayes says:
You are now an initiate
of the Bayesian Conspiracy.
4. Why truth? And
Some oI the comments in this blog have touched on the question

oI why we ought to seek truth. (ThankIully not many have ques
tioned what truth is
z
.) Our shaping motivation Ior conIiguring our
thoughts to rationality, which determines whether a given conIig
uration is good" or bad", comes Irom whyever we wanted to Iind
truth in the Iirst place.
It is written: The Iirst virtue is curiosity." Curiosity is one rea
son to seek truth, and it may not be the only one, but it has a special
and admirable purity. II your motive is curiosity, you will assign
priority to questions according to how the questions, themselves,
tickle your personal aesthetic sense. A trickier challenge, with a
greater probability oI Iailure, may be worth more eIIort than a sim
pler one, just because it is more Iun.
Some people, I suspect, may object that curiosity is an emotion
and is thereIore not rational". I label an emotion as not rational"
iI it rests on mistaken belieIs, or rather, on irrational epistemic con
duct: II the iron approaches your Iace, and you believe it is hot,
and it is cool, the \ay opposes your Iear. II the iron approaches
your Iace, and you believe it is cool, and it is hot, the \ay op
poses your calm." Conversely, then, an emotion which is evoked
by correct belieIs or epistemically rational thinking is a rational
emotion", and this has the advantage oI letting us regard calm as
an emotional state, rather than a privileged deIault. \hen people
think oI emotion" and rationality" as opposed, I suspect that they
are really thinking oI System and System zIast perceptual judg
ments versus slow deliberative judgments. Leliberative judgments
aren't always true, and perceptual judgments aren't always Ialse, so
it is very important to distinguish that dichotomy Irom rationali
ty". Both systems can serve the goal oI truth, or deIeat it, according
to how they are used.
Besides sheer emotional curiosity, what other motives are there
Ior desiring truth: \ell, you might want to accomplish some spe
ciIic realworld goal, like building an airplane, and thereIore you
need to know some speciIic truth about aerodynamics. Or more
mundanely, you want chocolate milk, and thereIore you want to
. http://lesswrong.com/lw/go/why_truth_and/
z. http://sl4.org/wiki/TheSimpleTruth
know whether the local grocery has chocolate milk, so you can
choose whether to walk there or somewhere else. II this is the rea
son you want truth, then the priority you assign to your questions
will reIlect the expected utility oI their inIormationhow much
the possible answers inIluence your choices, how much your choic
es matter, and how much you expect to Iind an answer that changes
your choice Irom its deIault.
To seek truth merely Ior its instrumental value may seem im
pureshould we not desire the truth Ior its own sake:but such
investigations are extremely important because they create an out
side criterion oI veriIication: iI your airplane drops out oI the sky,
or iI you get to the store and Iind no chocolate milk, it's a hint that
you did something wrong. You get back Ieedback on which modes
oI thinking work, and which don't. Pure curiosity is a wonderIul
thing, but it may not linger too long on veriIying its answers, once
the attractive mystery is gone. Curiosity, as a human emotion, has
been around since long beIore the ancient Creeks. But what set
humanity Iirmly on the path oI Science was noticing that certain
modes oI thinking uncovered belieIs that let us manipulate the world.
As Iar as sheer curiosity goes, spinning campIire tales oI gods and
heroes satisIied that desire just as well, and no one realized that any
thing was wrong with that.
Are there motives Ior seeking truth besides curiosity and prag
matism: The third reason that I can think oI is morality: You
believe that to seek the truth is noble and important and worth
while. Though such an ideal also attaches an intrinsic value to
truth, it's a very diIIerent state oI mind Irom curiosity. Being cu
rious about what's behind the curtain doesn't Ieel the same as
believing that you have a moral duty to look there. In the latter
state oI mind, you are a lot more likely to believe that someone
else should look behind the curtain, too, or castigate them iI they
deliberately close their eyes. For this reason, I would also label as
morality" the belieI that truthseeking is pragmatically important to
society, and thereIore is incumbent as a duty upon all. Your priori
ties, under this motivation, will be determined by your ideals about
which truths are most important (not most useIul or most intrigu
ing), or your moral ideals about when, under what circumstances,
the duty to seek truth is at its strongest.
I tend to be suspicious oI morality as a motivation Ior ratio
nality, not because I reject the moral ideal, but because it invites
certain kinds oI trouble. It is too easy to acquire, as learned moral
duties, modes oI thinking that are dreadIul missteps in the dance.
Consider Mr. Spock oI Star Trek, a naive archetype oI rationality.
Spock's emotional state is always set to calm", even when wildly
inappropriate. He oIten gives many signiIicant digits Ior probabil
ities that are grossly uncalibrated. (L.g: Captain, iI you steer the
Lnterprise directly into that black hole, our probability oI surviving
is only z.z%" Yet nine times out oI ten the Lnterprise is not de
stroyed. \hat kind oI tragic Iool gives Iour signiIicant digits Ior a
Iigure that is oII by two orders oI magnitude:) Yet this popular im
age is how many people conceive oI the duty to be rational"small
wonder that they do not embrace it wholeheartedly. To make ra
tionality into a moral duty is to give it all the dreadIul degrees oI
Ireedom oI an arbitrary tribal custom. People arrive at the wrong
answer, and then indignantly protest that they acted with propriety,
rather than learning Irom their mistake.
And yet iI we're going to improve our skills oI rationality, go be
yond the standards oI perIormance set by huntergatherers, we'll
need deliberate belieIs about how to think with propriety. \hen
we write new mental programs Ior ourselves, they start out in Sys
tem z, the deliberate system, and are only slowlyiI evertrained
into the neural circuitry that underlies System . So iI there are
certain kinds oI thinking that we Iind we want to avoidlike, say,
biasesit will end up represented, within System z, as an injunction
not to think that way, a proIessed duty oI avoidance.
II we want the truth, we can most eIIectively obtain it by think
ing in certain ways, rather than others, and these are the techniques
oI rationality. Some oI the techniques oI rationality involve over
coming a certain class oI obstacles, the biases.
(Continued in next post: \hat's a bias, again:")
WHY TRUTH? AND. 85
5. What is Evidence?
The sentence 'snow is white' is true iI and only iI snow is

white."
AlIred Tarski
To say oI what is, that it is, or oI what is not, that it is
not, is true."
Aristotle, Metaphysics IV
II these two quotes don't seem like a suIIicient deIinition oI truth",
read this
z
. Today I'm going to talk about evidence". (I also intend
to discuss belieIsoIIact, not emotions or morality, as distinguished
here
.)
\alking along the street, your shoelaces come untied. Shortly
thereaIter, Ior some odd reason, you start believing your shoelaces
are untied. Light leaves the Sun and strikes your shoelaces and
bounces oII, some photons enter the pupils oI your eyes and strike
your retina, the energy oI the photons triggers neural impulses, the
neural impulses are transmitted to the visualprocessing areas oI the
brain, and there the optical inIormation is processed and recon
structed into a L model that is recognized as an untied shoelace.
There is a sequence oI events, a chain oI cause and eIIect, within
the world and your brain, by which you end up believing what you
believe. The Iinal outcome oI the process is a state oI mind which
mirrors the state oI your actual shoelaces.
\hat is evidence? It is an event entangled, by links oI cause
and eIIect, with whatever you want to know about. II the target oI
your inquiry is your shoelaces, Ior example, then the light entering
your pupils is evidence entangled with your shoelaces. This should
not be conIused with the technical sense oI entanglement" used in
physicshere I'm just talking about entanglement" in the sense oI
two things that end up in correlated states because oI the links oI
cause and eIIect between them.
ot every inIluence creates the kind oI entanglement" required
Ior evidence. It's no help to have a machine that beeps when you
enter winning lottery numbers, iI the machine also beeps when you
. http://lesswrong.com/lw/jl/what_is_evidence/
z. http://sl4.org/wiki/TheSimpleTruth
. Page 6z,, 'Feeling Rational'.
enter losing lottery numbers. The light reIlected Irom your shoes
would not be useIul evidence about your shoelaces, iI the photons
ended up in the same physical state whether your shoelaces were
tied or untied.
To say it abstractly: For an event to be evidence about a target
oI inquiry, it has to happen differently in a way that's entangled with
the different possible states oI the target. (To say it technically:
There has to be Shannon mutual inIormation between the eviden
tial event and the target oI inquiry, relative to your current state oI
uncertainty about both oI them.)
Lntanglement can be contagious when processed correctly, which is
why you need eyes and a brain. II photons reIlect oII your shoelaces
and hit a rock, the rock won't change much. The rock won't reIlect
the shoelaces in any helpIul way, it won't be detectably diIIerent
depending on whether your shoelaces were tied or untied. This is
why rocks are not useIul witnesses in court. A photographic Iilm
will contract shoelaceentanglement Irom the incoming photons, so
that the photo can itselI act as evidence. II your eyes and brain
work correctly, you will become tangled up with your own shoelaces.
This is why rationalists put such a heavy premium on the
paradoxicalseeming claim that a belieI is only really worthwhile iI
you could, in principle, be persuaded to believe otherwise. II your
retina ended up in the same state regardless oI what light entered
it, you would be blind. Some belieI systems, in a rather obvious
trick to reinIorce themselves, say that certain belieIs are only real
ly worthwhile iI you believe them unconditionally no matter what
you see, no matter what you think. Your brain is supposed to end
up in the same state regardless. Hence the phrase, blind Iaith".
II what you believe doesn't depend on what you see, you've been
blinded as eIIectively as by poking out your eyeballs.
II your eyes and brain work correctly, your belieIs will end up
entangled with the Iacts. Rational thought produces beliefs which are
themselves evidence.
II your tongue speaks truly, your rational belieIs, which are
themselves evidence, can act as evidence Ior someone else. Ln
tanglement can be transmitted through chains oI cause and eI
Iectand iI you speak, and another hears, that too is cause and
WHAT IS EVIDENCE? 87
eIIect. \hen you say My shoelaces are untied" over a cellphone,
you're sharing your entanglement with your shoelaces with a Iriend.
ThereIore rational belieIs are contagious, among honest Iolk
who believe each other to be honest. And it's why a claim that
your belieIs are not contagiousthat you believe Ior private reasons
which are not transmissibleis so suspicious. II your belieIs are en
tangled with reality, they should be contagious among honest Iolk.
II your model oI reality suggests that the outputs oI your
thought processes should not be contagious to others, then your
model says that your belieIs are not themselves evidence, meaning
they are not entangled with reality. You should apply a reIlective
correction, and stop believing.
Indeed, iI you feel, on a gut level, what this all means, you will
automatically stop believing. Because my belieI is not entangled
with reality" means my belieI is not accurate". As soon as you stop
believing 'snow is white' is true", you should (automatically!) stop
believing snow is white", or something is very wrong.
So go ahead and explain why the kind oI thought processes you
use systematically produce belieIs that mirror reality. Lxplain why
you think you're rational. \hy you think that, using thought pro
cesses like the ones you use, minds will end up believing snow is
white" iI and only iI snow is white. II you don't believe that the
outputs oI your thought processes are entangled with reality, why
do you believe the outputs oI your thought processes: It's the same
thing, or it should be.
6. How Much Evidence Does It Take?
Followup to: \hat is Lvidence:

z
Previously
, I deIined evidence as an event entangled, by links oI

cause and eIIect, with whatever you want to know about", and en-
tangled as happening diIIerently Ior diIIerent possible states oI the
target". So how much entanglementhow much evidenceis re
quired to support a belieI:
Let's start with a question simple enough to be mathematical:
how hard would you have to entangle yourselI with the lottery
in order to win: Suppose there are seventy balls, drawn without

replacement, and six numbers to match Ior the win. Then there
are ,,,j8, possible winning combinations, hence a randomly
selected ticket would have a /,,,j8, probability oI winning
(o.oooooo;%). To win the lottery, you would need evidence se-
lective enough to visibly Iavor one combination over ,,,j8
alternatives.
Suppose there are some tests you can perIorm which discrim
inate, probabilistically, between winning and losing lottery num
bers. For example, you can punch a combination into a little black
box that always beeps iI the combination is the winner, and has
only a / (z,%) chance oI beeping iI the combination is wrong.
In Bayesian
,
terms, we would say the likelihood ratio is to . This
means that the box is times as likely to beep when we punch in
a correct combination, compared to how likely it is to beep Ior an
incorrect combination.
There are still a whole lot oI possible combinations. II you
punch in zo incorrect combinations, the box will beep on , oI them
by sheer chance (on average). II you punch in all ,,,j8, possi
ble combinations, then while the box is certain to beep Ior the one
winning combination, it will also beep Ior z,;;8,jj6 losing combi
nations (on average).
So this box doesn't let you win the lottery, but it's better than
nothing. II you used the box, your odds oI winning would go Irom
. http://lesswrong.com/lw/jn/how_much_evidence_does_it_take/
z. Page 86, '\hat is Lvidence:'.
. Page 86, '\hat is Lvidence:'.
. http://lesswrong.com/lw/hl/lotteries_a_waste_of_hope/
,. Page z8, 'An Intuitive Lxplanation oI Bayes' Theorem'.
in ,,,j8, to in z,;;8,jj;. You've made some progress toward
Iinding your target, the truth, within the huge space oI possibilities.
Suppose you can use another black box to test combinations
twice, independently. Both boxes are certain to beep Ior the winning
ticket. But the chance oI a box beeping Ior a losing combination is
/ independently Ior each box, hence the chance oI both boxes beep
ing Ior a losing combination is /6. \e can say that the cumulative
evidence, oI two independent tests, has a likelihood ratio oI 6:.
The number oI losing lottery tickets that pass both tests will be (on
average) 8,j,;j.
Since there are ,,,j8, possible lottery tickets, you might
guess that you need evidence whose strength is around ,,,j8, to
an event, or series oI events, which is ,,,j8, times more like
ly to happen Ior a winning combination than a losing combination.
Actually, this amount oI evidence would only be enough to give you
an even chance oI winning the lottery. \hy: Because iI you apply
a Iilter oI that power to million losing tickets, there will be, on
average, one losing ticket that passes the Iilter. The winning ticket
will also pass the Iilter. So you'll be leIt with two tickets that passed
the Iilter, only one oI them a winner. ,o% odds oI winning, iI you
can only buy one ticket.
A better way oI viewing the problem: In the beginning, there is
winning ticket and ,,,j8 losing tickets, so your odds oI win
ning are :,,,j8. II you use a single box, the odds oI it beeping
are Ior a winning ticket and o.z, Ior a losing ticket. So we multi
ply :,,,j8 by :o.z, and get :z,;;8,jj6. Adding another box
oI evidence multiplies the odds by :o.z, again, so now the odds are
winning ticket to 8,j,;j losing tickets.
It is convenient to measure evidence in bitsnot like bits on
a hard drive, but mathematician's bits, which are conceptually diI
Ierent. Mathematician's bits are the logarithms, base /z, oI prob
abilities. For example, iI there are Iour possible outcomes A, B, C,
and L, whose probabilities are ,o%, z,%, z.,%, and z.,%, and I
tell you the outcome was L", then I have transmitted three bits oI
inIormation to you, because I inIormed you oI an outcome whose
probability was /8.
It so happens that ,,,j8 is slightly less than z to the z;th
power. So boxes or z8 bits oI evidencean event z68,,,,6:
times more likely to happen iI the tickethypothesis is true than
iI it is Ialsewould shiIt the odds Irom :,,,j8 to
z68,,,,6:,,,j8, which reduces to z:. Odds oI z to mean
two chances to win Ior each chance to lose, so the probability oI win
ning with z8 bits oI evidence is z/. Adding another box, another
z bits oI evidence, would take the odds to 8:. Adding yet another
two boxes would take the chance oI winning to z8:.
So iI you want to license a strong belief that you will win the
lotteryarbitrarily deIined as less than a % probability oI being
wrong bits oI evidence about the winning combination should
do the trick.
In general, the rules Ior weighing how much evidence it takes"
Iollow a similar pattern: The larger the space of possibilities in which
the hypothesis lies, or the more unlikely the hypothesis seems a pri-
ori compared to its neighbors, or the more conIident you wish to
be, the more evidence you need.
You cannot deIy the rules, you cannot Iorm accurate belieIs
based on inadequate evidence. Let's say you've got o boxes lined
up in a row, and you start punching combinations into the boxes.
You cannot stop on the Iirst combination that gets beeps Irom
all o boxes, saying, But the odds oI that happening Ior a losing
combination are a million to one! I'll just ignore those ivorytower
Bayesian rules and stop here." On average, losing tickets will
pass such a test Ior every winner. Considering the space oI pos
sibilities and the prior improbability, you jumped to a toostrong
conclusion based on insuIIicient evidence. That's not a pointless
bureaucratic regulation, it's math.
OI course, you can still believe based on inadequate evidence, iI
that is your whim, but you will not be able to believe accurately. It
is like trying to drive your car without any Iuel, because you don't
believe in the sillydilly Iuddyduddy concept that it ought to take
Iuel to go places. It would be so much more fun, and so much less
expensive, iI we just decided to repeal the law that cars need Iuel.
Isn't it just obviously better Ior everyone: \ell, you can try, iI that
is your whim. You can even shut your eyes and pretend the car is
moving. But to really arrive at accurate belieIs requires evidenceIu
el, and the Iurther you want to go, the more Iuel you need.
HOW MUCH EVIDENCE DOES IT TAKE? 91
7. How to Convince Me That 2 + 2 = 3
In \hat is Lvidence:", I wrote

z
:
This is why rationalists put such a heavy premium on the
paradoxicalseeming claim that a belieI is only really
worthwhile iI you could, in principle, be persuaded to
believe otherwise. II your retina ended up in the same
state regardless oI what light entered it, you would be
blind. Hence the phrase, blind Iaith". II what you
believe doesn't depend on what you see, you've been
blinded as eIIectively as by poking out your eyeballs.
Cihan Baran replied
:
I can not conceive oI a situation that would make z-z
Ialse. Perhaps Ior that reason, my belieI in z-z is
unconditional.
I admit, I cannot conceive oI a situation" that would make z
- z Ialse. (There are redeIinitions, but those are not situa
tions", and then you're no longer talking about z, , , or -.) But
that doesn't make my belieI unconditional. I Iind it quite easy to
imagine a situation which would convince me that z - z .
Suppose I got up one morning, and took out two earplugs, and
set them down next to two other earplugs on my nighttable, and
noticed that there were now three earplugs, without any earplugs
having appeared or disappearedin contrast to my stored memory
that z - z was supposed to equal . Moreover, when I visualized
the process in my own mind, it seemed that making XX and XX
come out to XXXX required an extra X to appear Irom nowhere,
and was, moreover, inconsistent with other arithmetic I visualized,
since subtracting XX Irom XXX leIt XX, but subtracting XX Irom
XXXX leIt XXX. This would conIlict with my stored memory
that z , but memory would be absurd in the Iace oI physical
and mental conIirmation that XXX XX XX.
. http://lesswrong.com/lw/jr/how_to_convince_me_that_2_2_3/
. http://lesswrong.com/lw/jl/what_is_evidence/f7h
I would also check a pocket calculator, Coogle, and perhaps my
copy oI j8 where \inston writes that Freedom is the Ireedom
to say two plus two equals three." All oI these would naturally show
that the rest oI the world agreed with my current visualization, and
disagreed with my memory, that z - z .
How could I possibly have ever been so deluded as to believe
that z - z : Two explanations would come to mind: First, a
neurological Iault (possibly caused by a sneeze) had made all the
additive sums in my stored memory go up by one. Second, some
one was messing with me, by hypnosis or by my being a computer
simulation. In the second case, I would think it more likely that
they had messed with my arithmetic recall than that z - z actually
equalled . either oI these plausiblesounding explanations would
prevent me Irom noticing that I was very, very, very conIused
.
\hat would convince me that z - z , in other words, is exactly
the same kind oI evidence that currently convinces me that z - z
: The evidential crossIire oI physical observation, mental visual
ization, and social agreement.
There was a time when I had no idea that z - z . I did not ar
rive at this new belieI by random processesthen there would have
been no particular reason Ior my brain to end up storing z - z
" instead oI z - z ;". The Iact that my brain stores an answer
surprisingly similar to what happens when I lay down two earplugs
alongside two earplugs, calls Iorth an explanation oI what entangle
ment produces this strange mirroring oI mind and reality.
There's really only two possibilities, Ior a belieI oI Iact
,
either
the belieI got there via a mindreality entangling process
6
, or not.
II not, the belieI can't be correct except by coincidence. For belieIs
with the slightest shred oI internal complexity
;
(requiring a com
puter program oI more than o bits to simulate), the space oI
possibilities is large enough that coincidence vanishes.
\nconditional Iacts are not the same as unconditional belieIs.
II entangled evidence convinces me that a Iact is unconditional, this
doesn't mean I always believed in the Iact without need oI entan
gled evidence.
. Page z6, 'Your Strength as a Rationalist'.
,. Page 6z,, 'Feeling Rational'.
6. Page 86, '\hat is Lvidence:'.
;. Page j,, 'Occam's Razor'.
HOW TO CONVINCE ME THAT 2 2 3 93
I believe that z - z , and I Iind it quite easy to conceive oI a
situation which would convince me that z - z . amely, the same
sort oI situation that currently convinces me that z - z . Thus I
do not Iear that I am a victim oI blind Iaith.
II there are any Christians in the audience who know Bayess
Theorem (no numerophobes, please) might I inquire oI you what sit
uation would convince you oI the truth oI Islam: Presumably it
would be the same sort oI situation causally responsible Ior pro
ducing your current belieI in Christianity: \e would push you
screaming out oI the uterus oI a Muslim woman, and have you
raised by Muslim parents who continually told you that it is good to
believe unconditionally in Islam. Or is there more to it than that:
II so, what situation would convince you oI Islam, or at least, non
Christianity:
8. Occams Razor
Followup to: Burdensome Letails

z
, How Much Lvidence:
The more complex an explanation is, the more evidence you

need just to Iind it in belieIspace. (In Traditional Rationality this
is oIten phrased misleadingly
, as The more complex a proposition

is, the more evidence is required to argue Ior it.") How can we mea
sure the complexity oI an explanation: How can we determine how
much evidence is required:
Occam's Razor is oIten phrased as The simplest explanation
that Iits the Iacts." Robert Heinlein replied that the simplest ex
planation is The lady down the street is a witch, she did it."
One observes that the length oI an Lnglish sentence is not a
good way to measure complexity". And Iitting" the Iacts by mere
ly failing to prohibit them is insuIIicient.
\hy, exactly, is the length oI an Lnglish sentence a poor mea
sure oI complexity: Because when you speak a sentence aloud, you
are using labels Ior concepts that the listener sharesthe receiver
has already stored the complexity in them. Suppose we abbreviated
Heinlein's whole sentence as Tldtsiawsdi!" so that the entire expla
nation can be conveyed in one word, better yet, we'll give it a short
arbitrary label like Fnord!" Loes this reduce the complexity: o,
because you have to tell the listener in advance that Tldtsiaws
di!" stands Ior The lady down the street is a witch, she did it."
\itch", itselI, is a label Ior some extraordinary assertionsjust
because we all know what it means doesn't mean the concept is sim
ple.
An enormous bolt oI electricity comes out oI the sky and hits
something, and the orse tribesIolk say, Maybe a really powerIul
agent was angry and threw a lightning bolt." The human brain is
the most complex artiIact in the known universe. II anger seems
simple, it's because we don't see all the neural circuitry that's im
plementing the emotion. (Imagine trying to explain why Saturday
Night Live is Iunny, to an alien species with no sense oI humor.
But don't Ieel superior, you yourselI have no sense oI Inord.) The
. http://lesswrong.com/lw/jp/occams_razor/
z. http://lesswrong.com/lw/jk/burdensome_details/
. Page 8j, 'How Much Lvidence Loes It Take:'.
. http://lesswrong.com/lw/jo/einsteins_arrogance/
complexity oI anger, and indeed the complexity oI intelligence, was
glossed over by the humans who hypothesized Thor the thunder
agent.
To a human, Maxwell's Lquations take much longer to explain
than Thor. Humans don't have a builtin vocabulary Ior calculus
the way we have a builtin vocabulary Ior anger. You've got to ex
plain your language, and the language behind the language, and the
very concept oI mathematics, beIore you can start on electricity.
And yet it seems that there should be some sense in which
Maxwell's Lquations are simpler than a human brain, or Thor the
thunderagent.
There is: It's enormously easier (as it turns out) to write a com
puter program that simulates Maxwell's Lquations, compared to
a computer program that simulates an intelligent emotional mind
like Thor.
The Iormalism oI SolomonoII Induction measures the com
plexity oI a description" by the length oI the shortest computer
program which produces that description as an output. To talk
about the shortest computer program" that does something, you
need to speciIy a space oI computer programs, which requires a
language and interpreter. SolomonoII Induction uses Turing ma
chines, or rather, bitstrings that speciIy Turing machines. \hat iI
you don't like Turing machines: Then there's only a constant com
plexity penalty to design your own \niversal Turing Machine that
interprets whatever code you give it in whatever programming lan
guage you like. LiIIerent inductive Iormalisms are penalized by a
worstcase constant Iactor relative to each other, corresponding to
the size oI a universal interpreter Ior that Iormalism.
In the better (IMHO) versions oI SolomonoII Induction, the
computer program does not produce a deterministic prediction,
but assigns probabilities to strings. For example, we could write
a program to explain a Iair coin by writing a program that assigns
equal probabilities to all z strings oI length . This is
SolomonoII Induction's approach to fitting the observed data. The
higher the probability a program assigns to the observed data, the
better that program fits the data. And probabilities must sum to ,
so Ior a program to better Iit" one possibility, it must steal proba
bility mass Irom some other possibility which will then Iit" much
more poorly. There is no superIair coin that assigns oo% proba
bility to heads and oo% probability to tails.
How do we trade oII the Iit to the data, against the complexity
oI the program: II you ignore complexity penalties, and think only
about Iit, then you will always preIer programs that claim to deter
ministically predict the data, assign it oo% probability. II the coin
shows HTTHHT", then the program which claims that the coin
was Iixed to show HTTHHT" Iits the observed data 6 times bet
ter than the program which claims the coin is Iair. Conversely, iI
you ignore Iit, and consider only complexity, then the Iair coin" hy
pothesis will always seem simpler than any other hypothesis. Lven
iI the coin turns up HTHHTHHHTHHHHTHHHHHT." In
deed, the Iair coin is simpler and it Iits this data exactly as well as it
Iits any other string oI zo coinIlipsno more, no lessbut we see
another hypothesis, seeming not too complicated, that Iits the data
much better.
II you let a program store one more binary bit oI inIormation, it
will be able to cut down a space oI possibilities by halI, and hence
assign twice as much probability to all the points in the remain
ing space. This suggests that one bit oI program complexity should
cost at least a Iactor oI two gain" in the Iit. II you try to design
a computer program that explicitly stores an outcome like HT
THHT", the six bits that you lose in complexity must destroy all
plausibility gained by a 6Iold improvement in Iit. Otherwise, you
will sooner or later decide that all Iair coins are Iixed.
\nless your program is being smart, and compressing the data, it
should do no good just to move one bit Irom the data into the pro
gram description.
The way SolomonoII induction works to predict sequences is
that you sum up over all allowed computer programsiI any pro
gram is allowed, SolomonoII induction becomes uncom
putablewith each program having a prior probability oI (/z) to
the power oI its code length in bits, and each program is Iurther
weighted by its Iit to all data observed so Iar. This gives you a
weighted mixture oI experts that can predict Iuture bits.
The Minimum Message Length Iormalism is nearly equivalent
to SolomonoII induction. You send a string describing a code, and
then you send a string describing the data in that code. \hichev
OCCAM`S RAZOR 97
er explanation leads to the shortest total message is the best. II you
think oI the set oI allowable codes as a space oI computer pro
grams, and the code description language as a universal machine,
then Minimum Message Length is nearly equivalent to SolomonoII
induction. (early, because it chooses the shortest program, rather
than summing up over all programs.)
This lets us see clearly the problem with using The lady down
the street is a witch, she did it" to explain the pattern in the
sequence ooooo". II you're sending a message to a Iriend, try
ing to describe the sequence you observed, you would have to say:
The lady down the street is a witch, she made the sequence come
out ooooo." Your accusation oI witchcraIt wouldn't let you
shorten the rest oI the message, you would still have to describe, in
Iull detail, the data which her witchery caused.
\itchcraIt may Iit our observations in the sense oI qualitatively
permitting them, but this is because witchcraIt permits everything,
like saying Phlogiston!
,
So, even aIter you say witch", you still
have to describe all the observed data in Iull detail. You have not
compressed the total length of the message describing your observations by
transmitting the message about witchcraIt, you have simply added
a useless prologue, increasing the total length.
The real sneakiness was concealed in the word it" oI A witch
did it". A witch did what?
OI course, thanks to hindsight bias
6
and anchoring
;
and Iake
explanations
8
and Iake causality
j
and positive bias
o
and motivated
cognition
, it may seem all too obvious that iI a woman is a witch,

oI course she would make the coin come up ooooo. But oI this
I have already spoken.
,. Page ;, 'Fake Causality'.
6. Page , 'Hindsight bias'.
;. Page , 'Anchoring and Adjustment'.
8. Page 8, 'Fake Lxplanations'.
j. Page ;, 'Fake Causality'.
o. Page 6, 'Positive Bias: Look Into the Lark'.
. Page ,o,, 'Knowing About Biases Can Hurt People'.
9. The Lens That Sees Its Flaws
Continuation of: \hat is Lvidence:

z
Light leaves the Sun and strikes your shoelaces and bounces oII,
some photons enter the pupils oI your eyes and strike your retina,
the energy oI the photons triggers neural impulses, the neural im
pulses are transmitted to the visualprocessing areas oI the brain,
and there the optical inIormation is processed and reconstructed
into a L model that is recognized as an untied shoelace, and so you
believe that your shoelaces are untied.
Here is the secret oI deliberate rationalitythis whole entangle
ment process is not magic
, and you can understand it. You can

understand how you see your shoelaces. You can think about which
sort oI thinking processes will create belieIs which mirror reality,
and which thinking processes will not.
Mice can see, but they can't understand seeing. You can under
stand seeing, and because oI that, you can do things which mice
cannot do. Take a moment to marvel
at this, Ior it is indeed mar

velous.
Mice see, but they don't know they have visual cortexes, so they
can't correct Ior optical illusions. A mouse lives in a mental world
that includes cats, holes, cheese and mousetrapsbut not mouse
brains. Their camera does not take pictures oI its own lens. But
we, as humans, can look at a seemingly bizarre image
,
, and realize
that part oI what we're seeing is the lens itselI. You don't always
have to believe your own eyes, but you have to realize that you have
eyesyou must have distinct mental buckets Ior the map and the
territory, Ior the senses and reality. Lest you think this a trivial abil
ity, remember how rare it is in the animal kingdom.
The whole idea oI Science is, simply, reIlective reasoning about
a more reliable process Ior making the contents oI your mind mirror
the contents oI the world. It is the sort oI thing mice would never
invent. Pondering this business oI perIorming replicable experi
ments to IalsiIy theories", we can see why it works. Science is not a
. http://lesswrong.com/lw/jm/the_lens_that_sees_its_flaws/
. Page ,, 'Mysterious Answers to Mysterious Questions'.
. Page ;;, 'Science" as CuriosityStopper'.
,. http://www.richrock.com/gifs/optical-illusion-wheels-circles-rotating.png
separate magisterium
6
, Iar away Irom real liIe and the understand
ing oI ordinary mortals. Science is not something that only applies
to the inside oI laboratories
;
. Science, itselI, is an understandable
processintheworld that correlates brains with reality.
Science makes sense, when you think about it. But mice can't
think about thinking, which is why they don't have Science. One
should not overlook the wonder oI thisor the potential power it
bestows on us as individuals, not just scientiIic societies.
Admittedly, understanding the engine oI thought may be a little
more complicated than understanding a steam enginebut it is not a
fundamentally diIIerent task.
Once upon a time, I went to LFet's =philosophy to ask Lo
you believe a nuclear war will occur in the next zo years: II no, why
not:" One person who answered the question said he didn't expect
a nuclear war Ior oo years, because All oI the players involved in
decisions regarding nuclear war are not interested right now." But
why extend that out Ior oo years:", I asked. Pure hope," was his
reply.
ReIlecting on this whole thought process, we can see why the
thought oI nuclear war makes the person unhappy, and we can
see how his brain thereIore rejects the belieI. But, iI you imagine
a billion worldsLverett branches, or Tegmark duplicates
8
this
thought process will not systematically correlate
j
optimists to
branches in which no nuclear war occurs. (Some clever Iellow is
bound to say, Ah, but since I have hope, I'll work a little harder
at my job, pump up the global economy, and thus help to prevent
countries Irom sliding into the angry and hopeless state where nu
clear war is a possibility. So the two events are related aIter all."
At this point, we have to drag in Bayes's Theorem
o
and measure
the charge oI entanglement quantitatively. Your optimistic nature
cannot have that large an eIIect on the world, it cannot, oI itselI,
decrease the probability oI nuclear war by zo%, or however much
your optimistic nature shiIted your belieIs. ShiIting your belieIs by
6. http://lesswrong.com/lw/i8/religions_claim_to_be_nondisprovable/
;. http://lesswrong.com/lw/gv/outside_the_laboratory/
8. http://arxiv.org/abs/astro-ph/0302131
j. Page 86, '\hat is Lvidence:'.
o. Page z8, 'An Intuitive Lxplanation oI Bayes' Theorem'.
a large amount, due to an event that only carries a very tiny charge
oI entanglement, will still mess up your mapping.)
To ask which belieIs make you happy, is to turn inward, not out
wardit tells you something about yourselI, but it is not evidence
entangled with the environment. I have nothing anything against
happiness, but it should Iollow Irom
your picture oI the world,

rather than tampering with the mental paintbrushes.
II you can see thisiI you can see that hope is shiIting your first-
order thoughts by too large a degreeiI you can understand your
mind as a mappingengine with Ilaws in itthen you can apply a re
Ilective correction. The brain is a Ilawed lens through which to see
reality. This is true oI both mouse brains and human brains. But a
human brain is a Ilawed lens that can understand its own Ilawsits
systematic errors, its biasesand apply secondorder corrections to
them. This, in practice, makes the Ilawed lens Iar more powerIul.
ot perIect, but Iar more powerIul.
THE LENS THAT SEES ITS FLAWS 101
A sequence on how to see through the disguises of answers
or beliefs or statements, that dont answer or say or mean
anything.
Part II
Mysterious Answers to Mysterious
Questions
1. Making Beliefs Pay Rent (in Anticipated
Experiences)
Thus begins the ancient parable:

If a tree falls in a forest and no one hears it, does it make a sound? One
says, Yes it does, for it makes vibrations in the air. Another says, No it
does not, for there is no auditory processing in any brain.
Suppose that, aIter the tree Ialls, the two walk into the Iorest
together. \ill one expect to see the tree Iallen to the right, and
the other expect to see the tree Iallen to the leIt: Suppose that be
Iore the tree Ialls, the two leave a sound recorder next to the tree.
\ould one, playing back the recorder, expect to hear something
diIIerent Irom the other: Suppose they attach an electroencephalo
graph to any brain in the world, would one expect to see a diIIerent
trace than the other: Though the two argue, one saying o," and
the other saying Yes," they do not anticipate any diIIerent experi
ences. The two think they have diIIerent models oI the world, but
they have no diIIerence with respect to what they expect will happen
to them.
It's tempting to try to eliminate this mistake class by insisting
that the only legitimate kind oI belieI is an anticipation oI sensory
experience. But the world does, in Iact, contain much that is not
sensed directly. \e don't see the atoms underlying the brick, but
the atoms are in Iact there. There is a Iloor beneath your Ieet, but
you don't experience the Iloor directly, you see the light reflected Irom
the Iloor, or rather, you see what your retina and visual cortex have
processed oI that light. To inIer the Iloor Irom seeing the Iloor is to
step back into the unseen causes oI experience. It may seem like a
very short and direct step, but it is still a step.
You stand on top oI a tall building, next to a grandIather clock
with an hour, minute, and ticking second hand. In your hand is a
bowling ball, and you drop it oII the rooI. On which tick oI the
clock will you hear the crash oI the bowling ball hitting the ground:
To answer precisely, you must use belieIs like Earths gravity is
9.8 meters per second per second, and This building is around 120 meters tall.
These belieIs are not wordless anticipations oI a sensory experience,
they are verbalish, propositional. It probably does not exaggerate
. http://lesswrong.com/lw/i3/making_beliefs_pay_rent_in_anticipated_experiences/
much to describe these two belieIs as sentences made out oI words.
But these two belieIs have an inIerential consequence that is a direct
sensory anticipationiI the clock's second hand is on the z numer
al when you drop the ball, you anticipate seeing it on the numeral
when you hear the crash Iive seconds later. To anticipate sensory
experiences as precisely as possible, we must process belieIs that are
not anticipations oI sensory experience.
It is a great strength oI Homo sapiens that we can, better than any
other species in the world, learn to model the unseen. It is also one
oI our great weak points. Humans oIten believe in things that are
not only unseen but unreal.
The same brain that builds a network oI inIerred causes behind
sensory experience, can also build a network oI causes that is not
connected to sensory experience, or poorly connected. Alchemists
believed that phlogiston caused Iirewe could oversimply their
minds by drawing a little node labeled Phlogiston", and an arrow
Irom this node to their sensory experience oI a crackling camp
Iirebut this belieI yielded no advance predictions, the link Irom
phlogiston to experience was always conIigured aIter the experi
ence, rather than constraining the experience in advance. Or sup
pose your postmodern Lnglish proIessor teaches you that the Ia
mous writer \ulky \ilkinsen is actually a postutopian". \hat
does this mean you should expect Irom his books: othing. The
belieI, iI you can call it that, doesn't connect to sensory experience
at all. But you had better remember the propositional assertion that
\ulky \ilkinsen" has the postutopian" attribute, so you can re
gurgitate it on the upcoming quiz. Likewise iI postutopians" show
colonial alienation", iI the quiz asks whether \ulky \ilkinsen
shows colonial alienation, you'd better answer yes. The belieIs are
connected to each other, though still not connected to any antici
pated experience.
\e can build up whole networks oI belieIs that are connected
only to each othercall these Iloating" belieIs. It is a uniquely hu
man Ilaw among animal species, a perversion oI Homo sapienss ability
to build more general and Ilexible belieI networks.
The rationalist virtue oI empiricism consists oI constantly asking
which experiences our belieIs predictor better yet, prohibit. Lo
you believe that phlogiston is the cause oI Iire: Then what do
you expect to see happen, because oI that: Lo you believe that
106 MYSTERIOUS ANSWERS TO MYSTERIOUS QUESTIONS
\ulky \ilkinsen is a postutopian: Then what do you expect to
see because oI that: o, not colonial alienation", what experience
will happen to you? Lo you believe that iI a tree Ialls in the Iorest, and
no one hears it, it still makes a sound: Then what experience must
thereIore beIall you:
It is even better to ask: what experience must not happen to
you: Lo you believe that elan vital explains the mysterious aliveness
oI living beings: Then what does this belieI not allow to hap
penwhat would deIinitely IalsiIy this belieI: A null answer means
that your belieI does not constrain experience, it permits anything to
happen to you. It Iloats.
\hen you argue a seemingly Iactual question, always keep in
mind which diIIerence oI anticipation you are arguing about. II you
can't Iind the diIIerence oI anticipation, you're probably arguing
about labels in your belieI networkor even worse, Iloating belieIs,
barnacles on your network. II you don't know what experiences are
implied by \ulky \ilkinsen being a postutopian, you can go on
arguing Iorever. (You can also publish papers Iorever.)
Above all, don't ask what to believeask what to anticipate. Lv
ery question oI belieI should Ilow Irom a question oI anticipation,
and that question oI anticipation should be the center oI the in
quiry. Lvery guess oI belieI should begin by Ilowing to a speciIic
guess oI anticipation, and should continue to pay rent in Iuture an
ticipations. II a belieI turns deadbeat, evict it.
MAKING BELIEFS PAY RENT (IN ANTICIPATED EXPERIENCES) 107
2. Belief in Belief
Followup to: Making BelieIs Pay Rent (in Anticipated Lxperi

ences)
z
Carl Sagan once told a parable
oI a man who comes to us and

claims: There is a dragon in my garage." Fascinating! \e reply that
we wish to see this dragonlet us set out at once Ior the garage!
But wait," the claimant says to us, it is an invisible dragon."
ow as Sagan points out, this doesn't make the hypothesis un
IalsiIiable. Perhaps we go to the claimant's garage, and although we
see no dragon, we hear heavy breathing Irom no visible source, Ioot
prints mysteriously appear on the ground, and instruments show
that something in the garage is consuming oxygen and breathing
out carbon dioxide.
But now suppose that we say to the claimant, Okay, we'll visit
the garage and see iI we can hear heavy breathing," and the claimant
quickly says no, it's an inaudible dragon. \e propose to measure car
bon dioxide in the air, and the claimant says the dragon does not
breathe. \e propose to toss a bag oI Ilour into the air to see iI
it outlines an invisible dragon, and the claimant immediately says,
The dragon is permeable to Ilour."
Carl Sagan used this parable to illustrate the classic moral that
poor hypotheses need to do Iast Iootwork to avoid IalsiIication. But
I tell this parable to make a diIIerent point: The claimant must have
an accurate model oI the situation somewhere in his mind, because
he can anticipate, in advance, exactly which experimental results hell
need to excuse.
Some philosophers have been much conIused by such scenarios,
asking, Loes the claimant really believe there's a dragon present,
or not:" As iI the human brain only had enough disk space to repre
sent one belieI at a time! Real minds are more tangled than that. As
discussed in yesterday's post, there are diIIerent types oI belieI, not
all belieIs are direct anticipations
. The claimant clearly does not

anticipate seeing anything unusual upon opening the garage door,
otherwise he wouldn't make advance excuses. It may also be that
. http://lesswrong.com/lw/i4/belief_in_belief/
z. Page o,, 'Making BelieIs Pay Rent (in Anticipated Lxperiences)'.
. http://www.godlessgeeks.com/LINKS/Dragon.htm
. Page o,, 'Making BelieIs Pay Rent (in Anticipated Lxperiences)'.
the claimant's pool oI propositional belieIs contains There is a drag-
on in my garage. It may seem, to a rationalist, that these two belieIs
should collide and conIlict even though they are oI diIIerent types.
Yet it is a physical Iact that you can write The sky is green!" next
to a picture oI a blue sky without the paper bursting into Ilames.
The rationalist virtue oI empiricism is supposed to prevent us
Irom this class oI mistake. \e're supposed to constantly ask our
belieIs which experiences they predict, make them pay rent in
anticipation. But the dragonclaimant's problem runs deeper, and
cannot be cured with such simple advice. It's not exactly difficult to
connect belieI in a dragon to anticipated experience oI the garage.
II you believe there's a dragon in your garage, then you can expect
to open up the door and see a dragon. II you don't see a dragon,
then that means there's no dragon in your garage. This is pretty
straightIorward. You can even try it with your own garage.
o, this invisibility business is a symptom oI something much
worse.
Lepending on how your childhood went, you may remember a
time period when you Iirst began to doubt Santa Claus's existence,
but you still believed that you were supposed to believe in Santa
Claus, so you tried to deny the doubts. As Laniel Lennett observes,
where it is diIIicult to believe a thing, it is oIten much easier to
believe that you ought to believe it. \hat does it mean to believe
that the \ltimate Cosmic Sky
,
is both perIectly blue and perIectly
green: The statement is conIusing, it's not even clear what it would
mean to believe itwhat exactly would be believed, iI you believed.
You can much more easily believe that it is proper, that it is good and
virtuous and beneficial, to believe that the \ltimate Cosmic Sky is
both perIectly blue and perIectly green. Lennett calls this belieI
in belieI".
And here things become complicated, as human minds are wont
to doI think even Lennett oversimpliIies how this psychology
works in practice. For one thing, iI you believe in belieI, you cannot
admit to yourselI that you only believe in belieI, because it is virtu
ous to believe, not to believe in belieI, and so iI you only believe in
belieI, instead oI believing, you are not virtuous. obody will admit
to themselves, I don't believe the \ltimate Cosmic Sky is blue and
,. Page ,, 'A Fable oI Science and Politics'.
BELIEF IN BELIEF 109
green, but I believe I ought to believe it"not unless they are un
usually capable oI acknowledging their own lack oI virtue. People
don't believe in belieI in belieI, they just believe in belieI.
(Those who Iind this conIusing may Iind it helpIul to study
mathematical logic, which trains one to make very sharp distinc
tions between the proposition P, a prooI oI P, and a prooI that P is
provable. There are similarly sharp distinctions between P, wanting
P, believing P, wanting to believe P, and believing that you believe
P.)
There's diIIerent kinds oI belieI in belieI. You may believe in
belieI explicitly, you may recite in your deliberate stream oI con
sciousness the verbal sentence It is virtuous to believe that the
\ltimate Cosmic Sky is perIectly blue and perIectly green." (\hile
also believing that you believe this, unless you are unusually ca
pable oI acknowledging your own lack oI virtue.) But there's also
less explicit Iorms oI belieI in belieI. Maybe the dragonclaimant
Iears the public ridicule that he imagines will result iI he publicly
conIesses he was wrong (although, in Iact, a rationalist would con
gratulate him, and others are more likely to ridicule him iI he goes
on claiming there's a dragon in his garage). Maybe the dragon
claimant Ilinches away Irom the prospect oI admitting to himselI
that there is no dragon, because it conIlicts with his selIimage as
the glorious discoverer oI the dragon, who saw in his garage what all
others had Iailed to see.
II all our thoughts were deliberate verbal sentences like philoso
phers manipulate, the human mind would be a great deal easier Ior
humans to understand. Fleeting mental images, unspoken Ilinches,
desires acted upon without acknowledgementthese account Ior
as much oI ourselves as words.
\hile I disagree with Lennett on some details and complica
tions, I still think that Lennett's notion oI belief in belief is the key
insight necessary to understand the dragonclaimant. But we need
a wider concept oI belief, not limited to verbal sentences. BelieI"
should include unspoken anticipationcontrollers. BelieI in belieI"
should include unspoken cognitivebehaviorguiders. It is not psy
chologically realistic to say The dragonclaimant does not believe
there is a dragon in his garage, he believes it is beneIicial to believe
there is a dragon in his garage." But it is realistic to say the dragon
claimant anticipates as if there is no dragon in his garage, and makes
excuses as if he believed in the belieI.
You can possess an ordinary mental picture oI your garage, with
no dragons in it, which correctly predicts your experiences on open
ing the door, and never once think the verbal phrase There is no
dragon in my garage. I even bet it's happened to youthat when you
open your garage door or bedroom door or whatever, and expect to
see no dragons, no such verbal phrase runs through your mind.
And to Ilinch away Irom giving up your belieI in the dragonor
Ilinch away Irom giving up your self-image as a person who believes
in the dragonit is not necessary to explicitly think I want to believe
theres a dragon in my garage. It is only necessary to Ilinch away Irom
the prospect oI admitting you don't believe.
To correctly anticipate, in advance, which experimental results
shall need to be excused, the dragonclaimant must (a) possess an
accurate anticipationcontrolling model somewhere in his mind,
and (b) act cognitively to protect either (b) his IreeIloating propo
sitional belieI in the dragon or (bz) his selIimage oI believing in the
dragon.
II someone believes in their belieI in the dragon, and also be
lieves in the dragon, the problem is much less severe. They will be
willing to stick their neck out on experimental predictions, and per
haps even agree to give up the belieI iI the experimental prediction
is wrongalthough belieI in belieI can still interIere with this, iI the
belieI itselI is not absolutely conIident. \hen someone makes up
excuses in advance, it would seem to require that belieI, and belieI in
belieI, have become unsynchronized.
BELIEF IN BELIEF 111
3. Bayesian Judo
You can have some Iun with people whose anticipations get out oI
sync with what they believe they believe
z
.
I was once at a dinner party, trying to explain to a man what I
did Ior a living, when he said: I don't believe ArtiIicial Intelligence
is possible because only Cod can make a soul."
At this point I must have been divinely inspired, because I
instantly responded: You mean iI I can make an ArtiIicial Intelli
gence, it proves your religion is Ialse:"
He said, \hat:"
I said, \ell, iI your religion predicts that I can't possibly make
an ArtiIicial Intelligence, then, iI I make an ArtiIicial Intelligence,
it means your religion is Ialse. Lither your religion allows that it
might be possible Ior me to build an AI, or, iI I build an AI, that
disproves your religion."
There was a pause, as the one realized he had just made his hy
pothesis vulnerable to IalsiIication, and then he said, \ell, I didn't
mean that you couldn't make an intelligence, just that it couldn't be
emotional in the same way we are."
I said, So iI I make an ArtiIicial Intelligence that, without
being deliberately preprogrammed with any sort oI script, starts
talking about an emotional liIe that sounds like ours, that means
your religion is wrong."
He said, \ell, um, I guess we may have to agree to disagree on
this."
I said: o, we can't, actually. There's a theorem oI rationality
called Aumann's Agreement Theorem which shows that no two ra
tionalists can agree to disagree. II two people disagree with each
other, at least one oI them must be doing something wrong."
\e went back and Iorth on this brieIly. Finally, he said, \ell,
I guess I was really trying to say that I don't think you can make
something eternal."
I said, \ell, I don't think so either! I'm glad we were able
to reach agreement on this, as Aumann's Agreement Theorem re
. http://lesswrong.com/lw/i5/bayesian_judo/
z. Page o8, 'BelieI in BelieI'.
quires." I stretched out my hand, and he shook it, and then he
wandered away.
A woman who had stood nearby, listening to the conversation,
said to me gravely, That was beautiIul."
Thank you very much," I said.
BAYESIAN JUDO 113
4. Professing and Cheering
I once attended a panel on the topic, Are science and religion

compatible:" One oI the women on the panel, a pagan, held Iorth
interminably upon how she believed that the Larth had been cre
ated when a giant primordial cow was born into the primordial
abyss, who licked a primordial god into existence, whose descen
dants killed a primordial giant and used its corpse to create the
Larth, etc. The tale was long, and detailed, and more absurd than
the Larth being supported on the back oI a giant turtle. And the
speaker clearly knew enough science to know this.
I still Iind myselI struggling Ior words to describe what I saw as
this woman spoke. She spoke with. pride: SelIsatisIaction: A de
liberate Ilaunting oI herselI:
The woman went on describing her creation myth Ior what
seemed like Iorever, but was probably only Iive minutes. That
strange pride/satisIaction/Ilaunting clearly had something to do
with her knowing that her belieIs were scientiIically outrageous. And
it wasn't that she hated science, as a panelist she proIessed that re
ligion and science were compatible. She even talked about how it
was quite understandable that the Vikings talked about a primor
dial abyss, given the land in which they livedexplained away her
own religion!and yet nonetheless insisted this was what she be
lieved", said with peculiar satisIaction.
I'm not sure that Laniel Lennett's concept oI belieI in belieI
z
"
stretches to cover this event. It was weirder than that. She didn't
recite her creation myth with the Ianatical Iaith oI someone who
needs to reassure herselI. She didn't act like she expected us, the au
dience, to be convincedor like she needed our belieI to validate
her.
Lennett, in addition to suggesting belieI in belieI, has also sug
gested that much oI what is called religious belieI" should really
be studied as religious proIession". Suppose an alien anthropologist
studied a group oI postmodernist Lnglish students who all seeming
ly believed that \ulky \ilkensen was a postutopian author. The
appropriate question may not be \hy do the students all believe
. http://lesswrong.com/lw/i6/professing_and_cheering/
this strange belieI:" but \hy do they all write this strange sen
tence on quizzes:" Lven iI a sentence is essentially meaningless, you
can still know when you are supposed to chant the response aloud.
I think Lennett may be slightly too cynical in suggesting that
religious proIession is just saying the belieI aloudmost people are
honest enough that, iI they say a religious statement aloud, they will
also Ieel obligated to say the verbal sentence into their own stream
oI consciousness.
But even the concept oI religious proIession" doesn't seem to
cover the pagan woman's claim to believe in the primordial cow.
II you had to proIess a religious belieI to satisIy a priest, or satisIy
a coreligionistheck, to satisIy your own selIimage as a religious
personyou would have to pretend to believe much more convincingly
than this woman was doing. As she recited her tale oI the primor
dial cow, with that same strange Ilaunting pride, she wasn't even
trying to be persuasivewasn't even trying to convince us that she
took her own religion seriously. I think that's the part that so took
me aback. I know people who believe they believe ridiculous things,
but when they proIess them, they'll spend much more eIIort to con
vince themselves that they take their belieIs seriously.
It Iinally occurred to me that this woman wasn't trying to con
vince us or even convince herselI. Her recitation oI the creation
story wasn't about the creation oI the world at all. Rather, by launch
ing into a Iiveminute diatribe about the primordial cow, she was
cheering for paganism, like holding up a banner at a Iootball game. A
banner saying CO BL\LS
" isn't a statement oI Iact, or an attempt

to persuade, it doesn't have to be convincingit's a cheer.
That strange Ilaunting pride. it was like she was marching
naked in a gay pride parade. (Incidentally, I'd have no objection
iI she had marched naked in a gay pride parade. Lesbianism is
not something that truth can destroy
.) It wasn't just a cheer, like

marching, but an outrageous cheer, like marching nakedbelieving
that she couldn't be arrested or criticized, because she was doing it
Ior her pride parade.
. Page ,, 'A Fable oI Science and Politics'.
PROFESSING AND CHEERING 115
That's why it mattered to her that what she was saying was be
yond ridiculous. II she'd tried to make it sound more plausible, it
would have been like putting on clothes.
5. Belief as Attire
I have so Iar distinguished between belieI as anticipationcon

troller
z
, belieI in belieI
, proIessing and cheering
. OI these, we
might call anticipationcontrolling belieIs proper belieIs" and the
other Iorms improper belieI". A proper belieI can be wrong or ir
rational, e.g., someone who genuinely anticipates that prayer will
cure her sick baby, but the other Iorms are arguably not belieI at
all".
Yet another Iorm oI improper belieI is belieI as groupidenti
Iicationas a way oI belonging. Robin Hanson uses the excellent
metaphor
,
oI wearing unusual clothing, a group uniIorm like a
priest's vestments or a ]ewish skullcap, and so I will call this belieI
as attire".
In terms oI humanly realistic psychology
6
, the Muslims who
Ilew planes into the \orld Trade Center undoubtedly saw them
selves as heroes deIending truth, justice, and the Islamic \ay Irom
hideous alien monsters a la the movie Independence Lay
;
. Only a
very inexperienced nerd, the sort oI nerd who has no idea how non
nerds see the world, would say this out loud in an Alabama bar.
It is not an American thing to say. The American thing to say is
that the terrorists hate our Ireedom" and that Ilying a plane into
a building is a cowardly act". You cannot say the phrases heroic
selIsacriIice" and suicide bomber" in the same sentence, even Ior
the sake oI accurately describing how the Lnemy sees the world.
The very concept oI the courage and altruism oI a suicide bomber is
Lnemy attireyou can tell, because the Lnemy talks about it. The
cowardice and sociopathy oI a suicide bomber is American attire.
There are no quote marks you can use to talk about how the Lnemy
sees the world, it would be like dressing up as a azi Ior Halloween.
BelieIasattire may help explain how people can be passionate
about improper belieIs. Mere belieI in belieI
8
, or religious pro
. http://lesswrong.com/lw/i7/belief_as_attire/
z. Page o,, 'Making BelieIs Pay Rent (in Anticipated Lxperiences)'.
. Page o8, 'BelieI in BelieI'.
. Page , 'ProIessing and Cheering'.
,. http://lesswrong.com/lw/i6/professing_and_cheering/egb
6. Page zj, 'Are Your Lnemies Innately Lvil:'.
;. http://www.imdb.com/title/tt0116629/
8. Page o8, 'BelieI in BelieI'.
Iessing
j
, would have some trouble creating genuine, deep, powerIul
emotional eIIects. Or so I suspect, I conIess I'm not an expert
here. But my impression is this: People who've stopped
anticipatingasiI their religion is true, will go to great lengths to
convince themselves they are passionate, and this desperation can be
mistaken Ior passion. But it's not the same Iire they had as a child.
On the other hand, it is very easy Ior a human being to gen
uinely, passionately, gutlevel belong to a group, to cheer Ior their
Iavorite sports team
o
. (This is the Ioundation on which rests the
swindle oI Republicans vs. Lemocrats" and analogous Ialse dilem
mas
in other countries, but that's a topic Ior another post.) Iden

tiIying with a tribe is a very strong emotional Iorce. People will die
Ior it. And once you get people to identiIy with a tribe, the belieIs
which are attire oI that tribe will be spoken with the Iull passion oI
belonging to that tribe.
j. Page , 'ProIessing and Cheering'.
o. Page ,, 'A Fable oI Science and Politics'.
. Page ,8j, 'The Third Alternative'.
6. Focus Your Uncertainty
\ill bond yields go up, or down, or remain the same: II you're a TV

pundit and your job is to explain the outcome aIter the Iact, then
there's no reason to worry. o matter which oI the three possibili
ties comes true, you'll be able to explain why the outcome perIectly
Iits your pet market theory . There's no reason to think oI these
three possibilities as somehow opposed to one another, as exclusive,
because you'll get Iull marks Ior punditry no matter which outcome
occurs.
But wait! Suppose you're a novice TV pundit, and you aren't ex
perienced enough to make up plausible explanations on the spot.
You need to prepare remarks in advance Ior tomorrow's broadcast,
and you have limited time to prepare. In this case, it would be
helpIul to know which outcome will actually occurwhether bond
yields will go up, down, or remain the samebecause then you
would only need to prepare one set oI excuses.
Alas, no one can possibly Ioresee the Iuture. \hat are you
to do: You certainly can't use probabilities". \e all know Irom
school
z
that probabilities" are little numbers that appear next to
a word problem, and there aren't any little numbers here. \orse,
you feel uncertain. You don't remember feeling uncertain while you
were manipulating the little numbers in word problems. College class-
es teaching math are nice clean places, thereIore math itself can't apply
to liIe situations that aren't nice and clean. You wouldn't want to
inappropriately transIer thinking skills Irom one context to anoth
er
. Clearly, this is not a matter Ior probabilities".

onetheless, you only have oo minutes to prepare your ex
cuses. You can't spend the entire oo minutes on up", and also
spend all oo minutes on down", and also spend all oo minutes on
same". You've got to prioritize somehow.
II you needed to justiIy your time expenditure to a review com
mittee, you would have to spend equal time on each possibility.
Since there are no little numbers written down, you'd have no doc
umentation to justiIy spending diIIerent amounts oI time. You can
hear the reviewers now: And why, Mr. Finkledinger, did you spend ex-
. http://lesswrong.com/lw/ia/focus_your_uncertainty/
z. http://lesswrong.com/lw/i2/two_more_things_to_unlearn_from_school/
. http://www.aft.org/pubs-reports/american_educator/issues/summer07/Crit_Thinking.pdf
actly 42 minutes on excuse #3? Why not 41 minutes, or 43? Admit ityoure
not being objective! Youre playing subjective favorites!
But, you realize with a small Ilash oI relieI, there's no review
committee to scold you. This is good, because there's a major Fed
eral Reserve announcement tomorrow, and it seems unlikely that
bond prices will remain the same. You don't want to spend pre
cious minutes on an excuse you don't anticipate needing.
Your mind keeps driIting to the explanations you use on tele
vision, oI why each event plausibly Iits your market theory. But it
rapidly becomes clear that plausibility can't help you hereall three
events are plausible. Fittability to your pet market theory doesn't
tell you how to divide your time. There's an uncrossable gap be
tween your oo minutes oI time, which are conserved, versus your
ability to explain how an outcome Iits your theory, which is unlim
ited.
And yet. even in your uncertain state oI mind, it seems that you
anticipate the three events diIIerently, that you expect to need some
excuses more than others. Andthis is the Iascinating partwhen
you think oI something that makes it seem more likely that bond
prices will go up, then you Ieel less likely to need an excuse Ior bond
prices going down or remaining the same.
It even seems like there's a relation between how much you an
ticipate each oI the three outcomes, and how much time you want
to spend preparing each excuse. OI course the relation can't actual
ly be quantiIied. You have oo minutes to prepare your speech, but
there isn't oo oI anything to divide up in this anticipation business.
(Although you do work out that, if some particular outcome occurs,
then your utility Iunction is logarithmic in time spent preparing the
excuse.)
Still. your mind keeps coming back to the idea that anticipa
tion is limited, unlike excusability, but like time to prepare excuses.
Maybe anticipation should be treated as a conserved resource, like
money. Your Iirst impulse is to try to get more anticipation, but
you soon realize that, even iI you get more anticiptaion, you won't
have any more time to prepare your excuses. o, your only course
is to allocate your limited supply oI anticipation as best you can.
You're pretty sure you weren't taught anything like that in your
statistics courses. They didn't tell you what to do when you felt so
terribly uncertain. They didn't tell you what to do when there were
no little numbers handed to you. \hy, even iI you tried to use
numbers, you might end up using any sort oI numbers at allthere's
no hint what kind oI math to use, iI you should be using math!
Maybe you'd end up using pairs oI numbers, right and leIt numbers,
which you'd call LS Ior LexterSinister. or who knows what else:
(Though you do have only oo minutes to spend preparing excuses.)
II only there were an art oI focusing your uncertaintyoI squeezing
as much anticipation as possible into whichever outcome will actu-
ally happen!
But what could we call an art like that: And what would the
rules be like:
FOCUS YOUR UNCERTAINTY 121
7. The Virtue of Narrowness
What is true of one apple may not be true of another apple; thus more can be
said about a single apple than about all the apples in the world.
Twelve Virtues oI Rationality
z
\ithin their own proIessions, people grasp the importance oI
narrowness, a car mechanic knows the diIIerence between a car
buretor and a radiator, and would not think oI them both as car
parts". A huntergatherer knows the diIIerence between a lion and
a panther. A janitor does not wipe the Iloor with window cleaner,
even iI the bottles look similar to one who has not mastered the art.
Outside their own proIessions, people oIten commit the mis
step oI trying to broaden a word as widely as possible, to cover
as much territory as possible. Is it not more glorious, more wise,
more impressive, to talk about all the apples in the world: How
much loItier it must be to explain human thought in general, without
being distracted by smaller questions, such as how humans invent
techniques Ior solving a Rubik's Cube. Indeed, it scarcely seems
necessary to consider specific questions at all, isn't a general theory a
worthy enough accomplishment on its own:
It is the way oI the curious to liIt up one pebble Irom among
a million pebbles on the shore, and see something new about it,
something interesting, something diIIerent. You call these pebbles
diamonds", and ask what might be special about themwhat inner
qualities they might have in common, beyond the glitter you Iirst
noticed. And then someone else comes along and says: \hy not
call this pebble a diamond too: And this one, and this one:" They
are enthusiastic, and they mean well. For it seems undemocratic
and exclusionary and elitist and unholistic to call some pebbles dia
monds", and others not. It seems. narrow-minded iI you'll pardon
the phrase. Hardly open, hardly embracing, hardly communal.
You might think it poetic, to give one word many meanings, and
thereby spread shades oI connotation all around. But even poets, iI
they are good poets, must learn to see the world precisely. It is not
enough to compare love to a Ilower. Hot jealous unconsummated
love is not the same as the love oI a couple married Ior decades. II
. http://lesswrong.com/lw/ic/the_virtue_of_narrowness/
z. http://yudkowsky.net/virtues/
. http://yudkowsky.net/virtues/
you need a Ilower to symbolize jealous love, you must go into the
garden, and look, and make subtle distinctionsIind a Ilower with
a heady scent, and a bright color, and thorns. Lven iI your intent
is to shade meanings and cast connotations, you must keep precise
track oI exactly which meanings you shade and connote.
It is a necessary part oI the rationalist's artor even the poet's
art!to Iocus narrowly on unusual pebbles which possess some spe
cial quality. And look at the details which those pebblesand those
pebbles alone!share among each other. This is not a sin.
It is perIectly all right Ior modern evolutionary biologists to ex
plain just the patterns oI living creatures, and not the evolution"
oI stars or the evolution" oI technology. Alas, some unIortunate
souls use the same word evolution" to cover the naturally selected
patterns oI replicating liIe, and the strictly accidental structure oI
stars, and the intelligently conIigured structure oI technology. And
as we all know, iI people use the same word, it must all be the same
thing. You should automatically generalize anything you think you
know about biological evolution to technology. Anyone who tells
you otherwise must be a mere pointless pedant. It couldn't possibly
be that your abysmal ignorance oI modern evolutionary theory is
so total that you can't tell the diIIerence between a carburetor and
a radiator. That's unthinkable. o, the other guyyou know, the
one who's studied the mathis just too dumb to see the connec
tions.
And what could be more virtuous than seeing connections:
Surely the wisest oI all human beings are the ew Age gurus who
say Lverything is connected to everything else." II you ever say
this aloud, you should pause, so that everyone can absorb the sheer
shock oI this Leep \isdom.
There is a trivial mapping between a graph and its complement.
A Iully connected graph, with an edge between every two vertices,
conveys the same amount oI inIormation as a graph with no edges
at all. The important graphs are the ones where some things are not
connected to some other things.
\hen the unenlightened ones try to be proIound, they draw
endless verbal comparisons between this topic, and that topic,
which is like this, which is like that, until their graph is Iully con
nected and also totally useless. The remedy is speciIic knowledge
THE VIRTUE OF NARROWNESS 123
and indepth study. \hen you understand things in detail, you can
see how they are not alike, and start enthusiastically subtracting
edges off your graph.
Likewise, the important categories are the ones that do not con
tain everything in the universe. Cood hypotheses can only explain
some possible outcomes, and not others.
It was perIectly all right Ior Isaac ewton to explain just gravity,
just the way things Iall downand how planets orbit the Sun, and
how the Moon generates the tidesbut not the role oI money in hu
man society or how the heart pumps blood. Sneering at narrowness
is rather reminiscent oI ancient Creeks who thought that going out
and actually looking at things was manual labor, and manual labor
was Ior slaves.
As Plato put it (in The Republic, Book VII):
II anyone should throw back his head and learn
something by staring at the varied patterns on a ceiling,
apparently you would think that he was contemplating
with his reason, when he was only staring with his eyes.
I cannot but believe that no study makes the soul look
on high except that which is concerned with real being
and the unseen. \hether he gape and stare upwards, or
shut his mouth and stare downwards, iI it be things oI
the senses that he tries to learn something about, I
declare he never could learn, Ior none oI these things
admit oI knowledge: I say his soul is looking down, not
up, even iI he is Iloating on his back on land or on sea!"
Many today make a similar mistake, and think that narrow con
cepts are as lowly and unloIty and unphilosophical as, say, going out
and looking at thingsan endeavor only suited to the underclass.
But rationalistsand also poetsneed narrow words to express
precise thoughts, they need categories which include only some
things, and exclude others. There's nothing wrong with Iocusing
your mind, narrowing your categories, excluding possibilities, and
sharpening your propositions. Really, there isn't! II you make your
words too broad, you end up with something that isn't true and
doesn't even make good poetry.
And DONT EVEN GET ME STARTED on people who think
Wikipedia is an Artificial Intelligence, the invention of LSD was a Sin-
gularity or that corporations are superintelligent!
THE VIRTUE OF NARROWNESS 125
8. Your Strength as a Rationalist
(The Iollowing happened to me in an IRC chatroom, long enough

ago that I was still hanging around in IRC chatrooms. Time has
Iuzzed the memory and my report may be imprecise.)
So there I was, in an IRC chatroom, when someone reports that
a Iriend oI his needs medical advice. His Iriend says that he's been
having sudden chest pains, so he called an ambulance, and the am
bulance showed up, but the paramedics told him it was nothing, and
leIt, and now the chest pains are getting worse. \hat should his
Iriend do:
I was conIused by this story. I remembered reading about
homeless people in ew York who would call ambulances just to
be taken someplace warm, and how the paramedics always had to
take them to the emergency room, even on the z;th iteration. Be
cause iI they didn't, the ambulance company could be sued Ior lots
and lots oI money. Likewise, emergency rooms are legally obligated
to treat anyone, regardless oI ability to pay. (And the hospital ab
sorbs the costs, which are enormous, so hospitals are closing their
emergency rooms. It makes you wonder what's the point oI hav
ing economists iI we're just going to ignore them.) So I didn't quite
understand how the described events could have happened. Anyone
reporting sudden chest pains should have been hauled oII by an am
bulance instantly.
And this is where I Iell down as a rationalist. I remembered sev
eral occasions where my doctor would completely Iail to panic at
the report oI symptoms that seemed, to me, very alarming. And the
Medical Lstablishment was always right. Lvery single time. I had
chest pains myselI, at one point, and the doctor patiently explained
to me that I was describing chest muscle pain, not a heart attack.
So I said into the IRC channel, \ell, iI the paramedics told your
Iriend it was nothing, it must really be nothingthey'd have hauled
him oII iI there was the tiniest chance oI serious trouble."
Thus I managed to explain the story within my existing model,
though the Iit still Ielt a little Iorced.
. http://lesswrong.com/lw/if/your_strength_as_a_rationalist/
Later on, the Iellow comes back into the IRC chatroom and
says his Iriend made the whole thing up. Lvidently this was not one
oI his more reliable Iriends.
I should have realized, perhaps, that an unknown acquaintance
oI an acquaintance in an IRC channel might be less reliable
z
than
a published journal article. Alas, belieI is easier than disbelieI, we
believe instinctively, but disbelieI requires a conscious eIIort
.
So instead, by dint oI mighty straining, I Iorced my model oI re
ality to explain an anomaly that never actually happened. And I knew
how embarrassing this was. I knew that the useIulness oI a model is
not what it can explain, but what it can't. A hypothesis that Iorbids
nothing, permits everything, and thereby Iails to constrain anticipa
tion
.
Your strength as a rationalist is your ability to be more conIused
by Iiction than by reality. II you are equally good at explaining any
outcome, you have zero knowledge.
\e are all weak, Irom time to time, the sad part is that I could
have been stronger. I had all the inIormation I needed to arrive at
the correct answer, I even noticed the problem, and then I ignored
it. My Ieeling oI conIusion was a Clue, and I threw my Clue away.
I should have paid more attention to that sensation oI still feels
a little forced. It's one oI the most important Ieelings a truthseeker
can have, a part oI your strength as a rationalist. It is a design Ilaw
in human cognition that this sensation maniIests as a quiet strain in
the back oI your mind, instead oI a wailing alarm siren and a glow
ing neon sign reading LITHLR YO\R MOLLL IS FALSL OR
THIS STORY IS \ROC."
z. http://www.overcomingbias.com/2007/08/truth-bias.html
. http://www.wjh.harvard.edu/%7Edtg/
Gilbert%20et%20al%20%28EVERYTHING%20YOU%20READ%29.pdf
YOUR STRENGTH AS A RATIONALIST 127
9. Absence of Evidence Is Evidence of
Absence
From Robyn Lawes's Rational Choice in an Uncertain World:

Posthoc Iitting oI evidence to hypothesis was involved
in a most grievous chapter in \nited States history: the
internment oI ]apaneseAmericans at the beginning oI
the Second \orld \ar. \hen CaliIornia governor Larl
\arren testiIied beIore a congressional hearing in San
Francisco on February z, jz, a questioner pointed out
that there had been no sabotage or any other type oI
espionage by the ]apaneseAmericans up to that time.
\arren responded, I take the view that this lack |oI
subversive activity] is the most ominous sign in our
whole situation. It convinces me more than perhaps any
other Iactor that the sabotage we are to get, the FiIth
Column activities are to get, are timed just like Pearl
Harbor was timed. I believe we are just being lulled into
a Ialse sense oI security."
Consider \arren's argument Irom a Bayesian perspective
z
.
\hen we see evidence, hypotheses that assigned a higher likelihood
to that evidence, gain probability at the expense oI hypotheses that
assigned a lower likelihood to the evidence. This is a phenomenon
oI relative likelihoods and relative probabilities. You can assign a
high likelihood to the evidence and still lose probability mass to
some other hypothesis, iI that other hypothesis assigns a likelihood
that is even higher.
\arren seems to be arguing that, given that we see no sabotage,
this confirms that a FiIth Column exists. You could argue that a
FiIth Column might delay its sabotage. But the likelihood is still
higher that the absence oI a FiIth Column would perIorm an absence
oI sabotage.
Let L stand Ior the observation oI sabotage, H Ior the hy
pothesis oI a ]apaneseAmerican FiIth Column, and Hz Ior the
hypothesis that no FiIth Column exists. \hatever the likelihood
. http://lesswrong.com/lw/ih/absence_of_evidence_is_evidence_of_absence/
z. Page z8, 'An Intuitive Lxplanation oI Bayes' Theorem'.
that a FiIth Column would do no sabotage, the probability P(L'H),
it cannot be as large as the likelihood that no FiIth Column does no
sabotage, the probability P(L'Hz). So observing a lack oI sabotage
increases the probability that no FiIth Column exists.
A lack oI sabotage doesn't prove that no FiIth Column exists.
Absence oI proof is not proof oI absence. In logic, A>B, A implies
B", is not equivalent to -A>-B, notA implies notB".
But in probability theory, absence oI evidence is always evidence
oI absence. II L is a binary event and P(H'L) > P(H), seeing L
increases the probability oI H", then P(H'-L) < P(H), Iailure to ob
serve L decreases the probability oI H". P(H) is a weighted mix
oI P(H'L) and P(H'-L), and necessarily lies between the two. II
any oI this sounds at all conIusing, see An Intuitive Lxplanation oI
Bayesian Reasoning
.
\nder the vast majority oI realliIe circumstances, a cause may
not reliably produce signs oI itselI, but the absence oI the cause
is even less likely to produce the signs. The absence oI an obser
vation may be strong evidence oI absence or very weak evidence
oI absence, depending on how likely the cause is to produce the
observation. The absence oI an observation that is only weakly per
mitted (even iI the alternative hypothesis does not allow it at all), is
very weak evidence oI absence (though it is evidence nonetheless).
This is the Iallacy oI gaps in the Iossil record"Iossils Iorm only
rarely, it is Iutile to trumpet the absence oI a weakly permitted ob
servation when many strong positive observations have already been
recorded. But iI there are no positive observations at all, it is time
to worry, hence the Fermi Paradox.
Your strength as a rationalist
is your ability to be more con

Iused by Iiction than by reality, iI you are equally good at explaining
any outcome you have zero knowledge. The strength oI a model is
not what it can explain, but what it cant, Ior only prohibitions con
strain anticipation
,
. II you don't notice when your model makes
the evidence unlikely, you might as well have no model, and also you
might as well have no evidence, no brain and no eyes.
. Page z8, 'An Intuitive Lxplanation oI Bayes' Theorem'.
,. Page o8, 'BelieI in BelieI'.
ABSENCE OF EVIDENCE IS EVIDENCE OF ABSENCE 129
10. Conservation of Expected Evidence
Followup to: Absence oI Lvidence Is Lvidence oI Absence.

z
Friedrich Spee von LangenIeld, a priest who heard the conIes
sions oI condemned witches, wrote in 6 the Cautio Criminalis
('prudence in criminal cases') in which he bitingly described the de
cision tree Ior condemning accused witches: II the witch had led
an evil and improper liIe, she was guilty, iI she had led a good and
proper liIe, this too was a prooI, Ior witches dissemble and try to
appear especially virtuous. AIter the woman was put in prison: iI she
was aIraid, this proved her guilt, iI she was not aIraid, this proved
her guilt, Ior witches characteristically pretend innocence and wear
a bold Iront. Or on hearing oI a denunciation oI witchcraIt against
her, she might seek Ilight or remain, iI she ran, that proved her guilt,
iI she remained, the devil had detained her so she could not get
away.
Spee acted as conIessor to many witches, he was thus in a posi
tion to observe every branch oI the accusation tree, that no matter
what the accused witch said or did, it was held a prooI against
her. In any individual case, you would only hear one branch oI the
dilemma. It is Ior this reason that scientists write down their ex
perimental predictions in advance.
But you cant have it both waysas a matter oI probability theory,
not mere Iairness. The rule that absence oI evidence is evidence
oI absence
" is a special case oI a more general law, which I would

name Conservation oI Lxpected Lvidence: The expectation oI the
posterior probability, aIter viewing the evidence, must equal the
prior probability.
P(H) = P(H)
P(H) = P(H,E) + P(H,~E)
P(H) = P(H|E)*P(E) + P(H|~E)*P(~E)
Therefore, Ior every expectation oI evidence, there is an equal
and opposite expectation oI counterevidence.
. http://lesswrong.com/lw/ii/conservation_of_expected_evidence/
z. Page z8, 'Absence oI Lvidence Is Lvidence oI Absence'.
. Page z8, 'Absence oI Lvidence Is Lvidence oI Absence'.
II you expect a strong probability oI seeing weak evidence in
one direction, it must be balanced by a weak expectation oI seeing
strong evidence in the other direction. II you're very conIident
in your theory, and thereIore anticipate seeing an outcome that
matches your hypothesis, this can only provide a very small incre
ment to your belieI (it is already close to ), but the unexpected
Iailure oI your prediction would (and must) deal your conIidence a
huge blow. On average, you must expect to be exactly as conIident
as when you started out. Lquivalently, the mere expectation oI en
countering evidencebeIore you've actually seen itshould not
shiIt your prior belieIs. (Again, iI this is not intuitively obvious, see
An Intuitive Lxplanation oI Bayesian Reasoning
.)
So iI you claim
,
that no sabotage" is evidence for the existence
oI a ]apaneseAmerican FiIth Column, you must conversely hold
that seeing sabotage would argue against a FiIth Column. II you
claim that a good and proper liIe" is evidence that a woman is a
witch, then an evil and improper liIe must be evidence that she is
not a witch. II you argue
6
that Cod, to test humanity's Iaith, reIus
es to reveal His existence, then the miracles described in the Bible
must argue against the existence oI Cod.
Loesn't quite sound right, does it: Pay attention to that Ieeling
oI this seems a little forced, that quiet strain in the back oI your mind
;
.
It's important.
For a true Bayesian, it is impossible to seek evidence that con-
firms a theory. There is no possible plan you can devise, no clever
strategy, no cunning device, by which you can legitimately expect
your conIidence in a Iixed proposition to be higher (on average) than
beIore. You can only ever seek evidence to test a theory, not to con
Iirm it.
This realization can take quite a load oII your mind. You need
not worry about how to interpret every possible experimental result
to conIirm your theory. You needn't bother planning how to make
any given iota oI evidence conIirm your theory, because you know
that Ior every expectation oI evidence, there is an equal and op
positive expectation oI counterevidence. II you try to weaken the
,. Page z8, 'Absence oI Lvidence Is Lvidence oI Absence'.
6. http://lesswrong.com/lw/i8/religions_claim_to_be_nondisprovable/
;. Page z6, 'Your Strength as a Rationalist'.
CONSERVATION OF EXPECTED EVIDENCE 131
counterevidence oI a possible abnormal" observation, you can only
do it by weakening the support oI a normal" observation, to a pre
cisely equal and opposite degree. It is a zerosum game. o matter
how you connive, no matter how you argue, no matter how you
strategize, you can't possibly expect the resulting game plan to shiIt
your belieIs (on average) in a particular direction.
You might as well sit back and relax while you wait Ior the evi
dence to come in.
.human psychology is so screwed up.
11. Hindsight bias
Hindsight bias is when people who know the answer vastly overes
timate its predictability or obviousness, compared to the estimates oI
subjects who must guess without advance knowledge. Hindsight
bias is sometimes called the I-knew-it-all-along effect.
FischhoII and Beyth (j;,) presented students with historical
accounts oI unIamiliar incidents, such as a conIlict between the
Curkhas and the British in 8. Civen the account as background
knowledge, Iive groups oI students were asked what they would
have predicted as the probability Ior each oI Iour outcomes: British
victory, Curkha victory, stalemate with a peace settlement, or stale
mate with no peace settlement. Four experimental groups were
respectively told that these Iour outcomes were the historical out
come. The IiIth, control group was not told any historical out
come. In every case, a group told an outcome assigned substantially
higher probability to that outcome, than did any other group or the
control group.
Hindsight bias matters in legal cases, where a judge or jury must
determine whether a deIendant was legally negligent in Iailing to
Ioresee a hazard (Sanchiro zoo). In an experiment based on an ac
tual legal case, Kamin and Rachlinski (jj,) asked two groups to
estimate the probability oI Ilood damage caused by blockage oI a
cityowned drawbridge. The control group was told only the back
ground inIormation known to the city when it decided not to hire
a bridge watcher. The experimental group was given this inIorma
tion, plus the Iact that a Ilood had actually occurred. Instructions
stated the city was negligent iI the Ioreseeable probability oI Ilood
ing was greater than o%. ;6% oI the control group concluded the
Ilood was so unlikely that no precautions were necessary, ,;% oI
the experimental group concluded the Ilood was so likely that Iail
ure to take precautions was legally negligent. A third experimental
group was told the outcome andalso explicitly instructed to avoid
hindsight bias, which made no diIIerence: ,6% concluded the city
was legally negligent.
Viewing history through the lens oI hindsight, we vastly un
derestimate the cost oI eIIective saIety precautions. In j86, the
. http://lesswrong.com/lw/il/hindsight_bias/
Challenger exploded Ior reasons traced to an Oring losing Ilexibility
at low temperature. There were warning signs oI a problem with
the Orings. But preventing the Challenger disaster would have
required, not attending to the problem with the Orings, but at
tending to every warning sign which seemed as severe as the Oring
problem, without benefit of hindsight. It could have been done, but it
would have required a general policy much more expensive than just
Iixing the ORings.
Shortly aIter September th zoo, I thought to myselI, and now
someone will turn up minor intelligence warnings of something-or-other,
and then the hindsight will begin. Yes, I'm sure they had some mi
nor warnings oI an al Qaeda plot, but they probably also had minor
warnings oI maIia activity, nuclear material Ior sale, and an invasion
Irom Mars.
Because we don't see the cost oI a general policy, we learn overly
speciIic lessons. AIter September th, the FAA prohibited box
cutters on airplanesas iI the problem had been the Iailure to take
this particular obvious" precaution. \e don't learn the general les
son: the cost of effective caution is very high because you must attend to
problems that are not as obvious now as past problems seem in hindsight.
The test oI a model is how much probability it assigns to the
observed outcome. Hindsight bias systematically distorts this test,
we think our model assigned much more probability than it actually
did. Instructing the jury doesn't help. You have to write down
your predictions in advance
z
. Or as FischhoII (j8z) put it:
\hen we attempt to understand past events, we
implicitly test the hypotheses or rules we use both to
interpret and to anticipate the world around us. II, in
hindsight, we systematically underestimate the surprises
that the past held and holds Ior us, we are subjecting
those hypotheses to inordinately weak tests and,
presumably, Iinding little reason to change them.
z. Page o, 'Conservation oI Lxpected Lvidence'.
FischhoII, B. j8z. For those condemned to study the past:
Heuristics and biases in hindsight. In Kahneman et. al. j8z:
z-,.
FischhoII, B., and Beyth, R. j;,. I knew it would happen:
Remembered probabilities oI onceIuture things. Organizational
Behavior and Human PerIormance, : 6.
Kamin, K. and Rachlinski, ]. jj,. Lx Post = Lx Ante: Leter
mining Liability in Hindsight
. Law and Human Behavior, j():

8jo.
Sanchiro, C. zoo. Finding Lrror. Mich. St. L. Rev. 8j.
. http://www.jstor.org/view/01477307/ap050075/05a00120/0
HINDSIGHT BIAS 135
12. Hindsight Devalues Science
This excerpt
z
Irom Meyers's Exploring Social Psychology is worth
reading in entirety. Cullen Murphy, editor oI The Atlantic, said that
the social sciences turn up no ideas or conclusions that can't be
Iound in |any] encyclopedia oI quotations. Lay aIter day social
scientists go out into the world. Lay aIter day they discover that
people's behavior is pretty much what you'd expect."
OI course, the expectation" is all hindsight
. (Hindsight bias:
Subjects who know the actual answer to a question assign much
higher probabilities they would have" guessed Ior that answer,
compared to subjects who must guess without knowing the answer.)
The historian Arthur Schlesinger, ]r. dismissed scientiIic studies
oI \\II soldiers' experiences as ponderous demonstrations" oI
common sense. For example:
. Better educated soldiers suIIered more adjustment
problems than less educated soldiers. (Intellectuals were
less prepared Ior battle stresses than streetsmart people.)
z. Southern soldiers coped better with the hot South Sea
Island climate than orthern soldiers. (Southerners are
more accustomed to hot weather.)
. \hite privates were more eager to be promoted to
noncommissioned oIIicers than Black privates. (Years oI
oppression take a toll on achievement motivation.)
. Southern Blacks preIerred Southern to orthern \hite
oIIicers (because Southern oIIicers were more experienced
and skilled in interacting with Blacks).
,. As long as the Iighting continued, soldiers were more
eager to return home than aIter the war ended. (Luring
the Iighting, soldiers knew they were in mortal danger.)
How many oI these Iindings do you think you could have predict
ed in advance: out oI ,: out oI ,: Are there any cases where
you would have predicted the oppositewhere your model takes a
hit
: Take a moment to think beIore continuing.

. http://lesswrong.com/lw/im/hindsight_devalues_science/
z. http://csml.som.ohio-state.edu/Music829C/hindsight.bias.html
. Page , 'Hindsight bias'.
. Page o, 'Conservation oI Lxpected Lvidence'.
In this demonstration (Irom Paul LazarsIeld by way oI Meyers),
all oI the Iindings above are the opposite oI what was actually Iound.
How many times did you think your model took a hit: How many
times did you admit you would have been wrong: That's how good
your model really was. The measure oI your strength as a rational
ist
,
is your ability to be more conIused by Iiction than by reality.
\nless, oI course, I reversed the results again. \hat do you
think:
Lo your thought processes at this point, where you really dont
know the answer, Ieel diIIerent Irom the thought processes you
used to rationalize either side oI the known" answer:
Laphna Baratz exposed college students to pairs oI supposed
Iindings, one true (In prosperous times people spend a larger por
tion oI their income than during a recession") and one the truth's
opposite. In both sides oI the pair, students rated the supposed
Iinding as what they would have predicted". PerIectly standard
hindsight bias.
\hich leads people to think they have no need Ior science, be
cause they could have predicted" that.
(]ust as you would expect, right:)
Hindsight will lead us to systematically undervalue the surpris
ingness oI scientiIic Iindings, especially the discoveries we under-
standthe ones that seem real to us, the ones we can retroIit into
our models oI the world. II you understand neurology or physics
and read news in that topic, then you probably underestimate the
surprisingness oI Iindings in those Iields too. This unIairly deval
ues the contribution oI the researchers, and worse, will prevent you
Irom noticing when you are seeing evidence that doesn't Iit
6
what
you really would have expected.
\e need to make a conscious eIIort to be shocked enough.
,. Page z6, 'Your Strength as a Rationalist'.
6. Page o, 'Conservation oI Lxpected Lvidence'.
HINDSIGHT DEVALUES SCIENCE 137
13. Fake Explanations
Once upon a time, there was an instructor who taught physics stu
dents. One day she called them into her class, and showed them a
wide, square plate oI metal, next to a hot radiator. The students
each put their hand on the plate, and Iound the side next to the
radiator cool, and the distant side warm. And the instructor said,
Why do you think this happens? Some students guessed convection oI
air currents, and others guessed strange metals in the plate. They
devised many creative explanations, none stooping so low as to say
I don't know" or This seems impossible.
z
"
And the answer was that beIore the students entered the room,
the instructor turned the plate around.
Consider the student who Irantically stammers, Lh, maybe be
cause oI the heat conduction and so:" I ask: is this answer a proper
belieI
: The words are easily enough proIessed
said in a loud,
emphatic voice. But do the words actually control anticipation
,
:
Ponder that innocent little phrase, because oI", which comes
beIore heat conduction". Ponder some oI the other things we could
put aIter it. \e could say, Ior example, Because oI phlogiston", or
Because oI magic."
Magic!" you cry. That's not a scientific explanation!" Indeed,
the phrases because oI heat conduction" and because oI magic"
are readily recognized as belonging to diIIerent literary genres. Heat
conduction" is something that Spock might say on Star Trek, where
as magic" would be said by Ciles in Buffy the Vampire Slayer.
However, as Bayesians, we take no notice oI literary genres. For
us, the substance oI a model is the control it exerts on anticipation.
II you say heat conduction", what experience does that lead you to
anticipate? \nder normal circumstances, it leads you to anticipate
that, iI you put your hand on the side oI the plate near the radia
tor, that side will Ieel warmer than the opposite side. II because oI
heat conduction" can also explain the radiatoradjacent side Ieeling
cooler, then it can explain pretty much anything.
. http://lesswrong.com/lw/ip/fake_explanations/
z. Page z6, 'Your Strength as a Rationalist'.
. Page ;, 'BelieI as Attire'.
. Page , 'ProIessing and Cheering'.
,. Page o8, 'BelieI in BelieI'.
And
6
as
;
we
8
all
j
know
o
by
this
z
point
(I
do
,
hope
6
),
iI you are equally good at explaining any outcome, you have zero
knowledge. Because oI heat conduction", used in such Iashion, is
a disguised hypothesis oI maximum entropy. It is anticipationiso
morphic to saying magic". It Ieels like an explanation, but it's not.
Supposed that instead oI guessing, we measured the heat oI the
metal plate at various points and various times. Seeing a metal
plate next to the radiator, we would ordinarily expect the point
temperatures to satisIy an equilibrium oI the diIIusion equation
with respect to the boundary conditions imposed by the environ
ment. You might not know the exact temperature oI the Iirst point
measured, but aIter measuring the Iirst pointsI'm not physicist
enough to know how many would be requiredyou could take an
excellent guess at the rest.
A true master oI the art oI using numbers to constrain the an
ticipation oI material phenomenaa physicist"would take some
measurements and say, This plate was in equilibrium with the en
vironment two and a halI minutes ago, turned around, and is now
approaching equilibrium again."
The deeper error oI the students is not simply that they Iailed to
constrain anticipation. Their deeper error is that they thought they
were doing physics. They said the phrase because oI", Iollowed by
the sort oI words Spock might say on Star Trek, and thought they
thereby entered the magisterium oI science.
ot so. They simply moved their magic Irom one literary genre
to another.
6. Page o,, 'Making BelieIs Pay Rent (in Anticipated Lxperiences)'.
;. Page o8, 'BelieI in BelieI'.
8. Page z, 'Bayesian ]udo'.
j. Page , 'ProIessing and Cheering'.
o. Page ;, 'BelieI as Attire'.
. http://lesswrong.com/lw/i8/religions_claim_to_be_nondisprovable/
z. Page j, 'Focus Your \ncertainty'.
. Page z8, 'Absence oI Lvidence Is Lvidence oI Absence'.
,. Page 6, 'Hindsight Levalues Science'.
FAKE EXPLANATIONS 139
14. Guessing the Teachers Password
Followup to: Fake Lxplanations

z
\hen I was young, I read popular physics books such as
Richard Feynman's QED: The Strange
Theory of Light and Matter. I

knew that light was waves, sound was waves, matter was waves. I
took pride in my scientiIic literacy, when I was nine years old.
\hen I was older, and I began to read the Feynman Lectures on
Physics, I ran across a gem called the wave equation". I could Iollow
the equation's derivation, but, looking back
, I couldn't see its truth

at a glance. So I thought about the wave equation Ior three days, on
and oII, until I saw that it was embarrassingly obvious. And when
I Iinally understood, I realized that the whole time I had accepted
the honest assurance oI physicists that light was waves, sound was
waves, matter was waves, I had not had the vaguest idea oI what the
word wave" meant to a physicist.
There is an instinctive tendency to think that iI a physicist says
light is made oI waves", and the teacher says \hat is light made
oI:", and the student says \aves!", the student has made a true
statement. That's only Iair, right: \e accept waves" as a correct
answer Irom the physicist, wouldn't it be unIair to reject it Irom the
student: Surely, the answer \aves!" is either true or false, right:
\hich is one more bad habit to unlearn Irom school
,
. \ords
do not have intrinsic deIinitions. II I hear the syllables beaver"
and think oI a large rodent, that is a Iact about my own state oI
mind, not a Iact about the syllables beaver". The sequence oI syl
lables made oI waves" (or because oI heat conduction
6
) is not a
hypothesis, it is a pattern oI vibrations traveling through the air, or
ink on paper. It can associate to a hypothesis in someone's mind,
but it is not, oI itselI, right or wrong. But in school, the teacher
hands you a gold star Ior saying made oI waves", which must be the
correct answer because the teacher heard a physicist emit the same
soundvibrations. Since verbal behavior (spoken or written) is what
. http://lesswrong.com/lw/iq/guessing_the_teachers_password/
z. Page 8, 'Fake Lxplanations'.
. http://lesswrong.com/lw/hs/think_like_reality/
. http://www.math.utah.edu/~pa/math/polya.html
,. http://lesswrong.com/lw/i2/two_more_things_to_unlearn_from_school/
gets the gold star, students begin to think that verbal behavior has a
truthvalue. AIter all, either light is made oI waves, or it isn't, right:
And this leads into an even worse habit. Suppose the teacher
presents you with a conIusing problem
;
involving a metal plate next
to a radiator, the Iar side Ieels warmer than the side next to the radi
ator. The teacher asks \hy:" II you say I don't know", you have
no chance oI getting a gold starit won't even count as class par
ticipation. But, during the current semester, this teacher has used
the phrases because oI heat convection", because oI heat conduc
tion", and because oI radiant heat". One oI these is probably what
the teacher wants. You say, Lh, maybe because oI heat conduc
tion:"
This is not a hypothesis about the metal plate. This is not even
a proper belieI
8
. It is an attempt to guess the teachers password.
Lven visualizing the symbols oI the diIIusion equation (the math
governing heat conduction) doesn't mean you've Iormed a hypoth
esis about the metal plate. This is not school, we are not testing
your memory to see iI you can write down the diIIusion equation.
This is BayescraIt, we are scoring your anticipations oI experience.
II you use the diIIusion equation, by measuring a Iew points with a
thermometer and then trying to predict what the thermometer will
say on the next measurement, then it is deIinitely connected to ex
perience. Lven iI the student just visualizes something flowing, and
thereIore holds a match near the cooler side oI the plate to try to
measure where the heat goes, then this mental image oI Ilowing
ness connects to experience, it controls anticipation.
II you aren't using the diIIusion equationputting in numbers
and getting out results that control your anticipation oI particular
experiencesthen the connection between map and territory is
severed as though by a kniIe. \hat remains is not a belieI
j
, but a
verbal behavior.
In the school system, it's all about verbal behavior, whether written
on paper or spoken aloud. Verbal behavior gets you a gold star
or a Iailing grade. Part oI unlearning this bad habit is becoming
;. Page 8, 'Fake Lxplanations'.
8. Page ;, 'BelieI as Attire'.
j. Page ;, 'BelieI as Attire'.
GUESSING THE TEACHER`S PASSWORD 141
consciously aware oI the diIIerence between an explanation and a
password.
Loes this seem too harsh: \hen you're Iaced by a conIusing
metal plate, can't Heat conduction:" be a Iirst step toward Iinding
the answer: Maybe, but only iI you don't Iall into the trap oI think
ing that you are looking Ior a password. \hat iI there is no teacher
to tell you that you Iailed: Then you may think that Light is
wakalixes" is a good explanation, that wakalixes" is the correct
password. It happened to me when I was nine years oldnot be
cause I was stupid, but because this is what happens by default. This
is how human beings think, unless they are trained not to Iall into
the trap. Humanity stayed stuck in holes like this Ior thousands oI
years.
Maybe, iI we drill students that words dont count, only anticipation-
controllers, the student will not get stuck on Heat conduction: o:
Maybe heat convection: That's not it either:" Maybe then, think
ing the phrase Heat conduction" will lead onto a genuinely helpIul
path, like:
Heat conduction:"
But that's only a phrasewhat does it mean:
The diIIusion equation:
But those are only symbolshow do I apply them:
\hat does applying the diIIusion equation lead me to
anticipate:
It sure doesn't lead me to anticipate that the side oI a
metal plate Iarther away Irom a radiator would Ieel
warmer.
I notice
o
that I am conIused
. Maybe the near side just

feels cooler, because it's made oI more insulative material
and transIers less heat to my hand: I'll try measuring the
temperature.
Okay, that wasn't it. Can I try to veriIy whether the
diIIusion equation holds true oI this metal plate, at all: Is
heat flowing the way it usually does, or is something else
going on:
I could hold a match to the plate and try to measure how
heat spreads over time.
o. Page z6, 'Your Strength as a Rationalist'.
. Page 6, 'Hindsight Levalues Science'.
II we are not strict about Lh, maybe because oI heat conduc
tion:" being a Iake explanation, the student will very probably get
stuck on some wakalixespassword. This happens by default, it hap-
pened to the whole human species for thousands of years.
GUESSING THE TEACHER`S PASSWORD 143
15. Science as Attire
Smallerstormz Prerequisites: Fake

Lxplanations
z
, BelieI As
Attire
The preview Ior the

X-Men movie has a voice
over saying: In every
human being. there is
the genetic code. Ior
mutation." Apparently
you can acquire all sorts oI neat abilities by mutation. The mutant
Storm, Ior example, has the ability to throw lightning bolts.
I beg you, dear reader, to consider the biological machinery nec
essary to generate electricity, the biological adaptations necessary
to avoid being harmed by electricity, and the cognitive circuitry re
quired Ior Iinely tuned control oI lightning bolts. II we actually
observed any organism acquiring these abilities in one generation, as
the result oI mutation, it would outright IalsiIy the neoLarwinian
model oI natural selection. It would be worse than Iinding rabbit
Iossils in the preCambrian. II evolutionary theory could actually
stretch to cover Storm, it would be able to explain anything
, and
we all know what that would imply.
The X-Men comics use terms like evolution", mutation", and
genetic code", purely to place themselves in what they conceive to
be the literary genre oI science. The part that scares me is wonder
ing how many people, especially in the media, understand science
only as a literary genre.
I encounter people who very deIinitely believe in
,
evolution,
who sneer at the Iolly oI creationists. And yet they have no idea
oI what the theory oI evolutionary biology permits and prohibits.
They'll talk about the next step in the evolution oI humanity", as
iI natural selection got here by Iollowing a plan. Or even worse,
they'll talk about something completely outside the domain oI evo
. http://lesswrong.com/lw/ir/science_as_attire/
. Page ;, 'BelieI as Attire'.
,. Page , 'ProIessing and Cheering'.
lutionary biology, like an improved design Ior computer chips, or
corporations splitting, or humans uploading themselves into com
puters, and they'll call that evolution". II evolutionary biology
could cover that, it could cover anything.
Probably an actual majority oI the people who believe in evolu
tion use the phrase because oI evolution
6
" because they want to be
part oI the scientiIic incrowdbelieI as scientiIic attire
;
, like wear
ing a lab coat. II the scientiIic incrowd instead used the phrase
because oI intelligent design", they would just as cheerIully use
that insteadit would make no diIIerence to their anticipation
controllers. Saying because oI evolution" instead oI because oI
intelligent design" does not, for them, prohibit Storm. Its only pur
pose, Ior them, is to identiIy with a tribe.
I encounter people who are quite willing to entertain the notion
oI dumberthanhuman ArtiIicial Intelligence, or even mildly
smarterthanhuman ArtiIicial Intelligence. Introduce the notion
oI strongly superhuman ArtiIicial Intelligence, and they'll suddenly
decide it's pseudoscience
8
. It's not that they think they have a
theory oI intelligence which lets them calculate a theoretical upper
bound on the power oI an optimization process. Rather, they as
sociate strongly superhuman AI to the literary genre oI apocalyptic
literature, whereas an AI running a small corporation associates to
the literary genre oI Wired magazine. They aren't speaking Irom
within a model oI cognition. They don't realize they need a model.
They don't realize that science is about models. Their devastating
critiques consist purely oI comparisons to apocalyptic literature, rather
than, say, known laws which prohibit such an outcome. They un
derstand science only as a literary genre, or ingroup to belong to.
The attire
j
doesn't look to them like a lab coat, this isn't the Ioot
ball team they're cheering
o
Ior.
Is there anything in science that you are proud oI believing, and
yet you do not use the belieI proIessionally: You had best ask your
selI which Iuture experiences your belieI prohibits Irom happening
to you. That is the sum oI what you have assimilated and made a
;. Page ;, 'BelieI as Attire'.
8. http://lesswrong.com/lw/io/is_molecular_nanotechnology_scientific/
j. Page ;, 'BelieI as Attire'.
o. Page , 'ProIessing and Cheering'.
SCIENCE AS ATTIRE 145
true part oI yourselI. Anything else is probably passwords
or at
tire
z
.
. Page o, 'Cuessing the Teacher's Password'.
z. Page ;, 'BelieI as Attire'.
16. Fake Causality
Followup to: Fake Lxplanations

z
, Cuessing the Teacher's Pass
word
Phlogiston was the 8 century's answer to the Llemental Fire

oI the Creek alchemists. Ignite wood, and let it burn. \hat is
the orangeybright Iire" stuII: \hy does the wood transIorm into
ash: To both questions, the 8thcentury chemists answered,
phlogiston".
.and that was it, you see, that was their answer: Phlogiston."
Phlogiston escaped Irom burning substances as visible Iire. As
the phlogiston escaped, the burning substances lost phlogiston and
so became ash, the true material". Flames in enclosed containers
went out because the air became saturated with phlogiston, and so
could not hold any more. Charcoal leIt little residue upon burning
because it was nearly pure phlogiston.
OI course, one didn't use phlogiston theory to predict the out
come oI a chemical transIormation. You looked at the result Iirst,
then you used phlogiston theory to explain it. It's not that phlo
giston theorists predicted a Ilame would extinguish in a closed
container, rather they lit a Ilame in a container, watched it go out,
and then said, The air must have become saturated with phlo
giston." You couldn't even use phlogiston theory to say what you
ought not to see
, it could explain everything.

This was an earlier age oI science. For a long time, no one real
ized there was a problem. Fake explanations
,
don't feel Iake. That's
what makes them dangerous.
Modern research suggests that humans think about cause and
eIIect using something like the directed acyclic graphs (LACs) oI
Bayes nets. Because it rained, the sidewalk is wet, because the side
walk is wet, it is slippery:
|Rain] > |Sidewalk wet] > |Sidewalk slippery]
. http://lesswrong.com/lw/is/fake_causality/
,. Page 8, 'Fake Lxplanations'.
From this we can inIeror, in a Bayes net, rigorously calculate
in probabilitiesthat when the sidewalk is slippery, it probably
rained, but iI we already know that the sidewalk is wet, learning
that the sidewalk is slippery tells us nothing more about whether it
rained.
\hy is Iire hot and bright when it burns:
|Phlogiston"] > |Fire hot and bright]
It feels like an explanation. It's represented using the same cog
nitive data Iormat. But the human mind does not automatically
detect when a cause has an unconstraining arrow to its eIIect.
\orse, thanks to hindsight bias
6
, it may Ieel like the cause con
strains
;
the eIIect, when it was merely Iitted
8
to the eIIect.
Interestingly, our modern understanding oI probabilistic rea
soning about causality
j
can describe precisely what the phlogiston
theorists were doing wrong. One oI the primary inspirations Ior
Bayesian networks was noticing the problem oI doublecounting ev
idence iI inIerence resonates between an eIIect and a cause. For
example, let's say that I get a bit oI unreliable inIormation that the
sidewalk is wet. This should make me think it's more likely to be
raining. But, iI it's more likely to be raining, doesn't that make it
more likely that the sidewalk is wet: And wouldn't that make it
more likely that the sidewalk is slippery: But iI the sidewalk is slip
pery, it's probably wet, and then I should again raise my probability
that it's raining.
]udea Pearl uses the metaphor oI an algorithm Ior counting sol
diers in a line. Suppose you're in the line, and you see two soldiers
next to you, one in Iront and one in back. That's three soldiers. So
you ask the soldier next to you, How many soldiers do you see:"
He looks around and says, Three". So that's a total oI six soldiers.
This, obviously, is not how to do it.
A smarter way is to ask the soldier in Iront oI you, How
many soldiers Iorward oI you:" and the soldier in back, How many
soldiers backward oI you:" The question How many soldiers Ior
6. Page , 'Hindsight bias'.
;. Page o,, 'Making BelieIs Pay Rent (in Anticipated Lxperiences)'.
j. http://books.google.com/
books?id=k9VsqN24pNYC&dq=&pg=PP1&ots=WR9UGWdOdd&sig=w_Mrax-y4VVwZy5SQ-
GySphNsKMc&prev=http://www.google.com/
search%3Fhl%3Den%26safe%3Doff%26q%3Dpearl%2Bintelligent%2Bsystems%26btnG%3DSearch&sa=X&oi=print&ct=title#PPA143,M1
ward:" can be passed on as a message without conIusion. II I'm at
the Iront oI the line, I pass the message soldier Iorward", Ior my
selI. The person directly in back oI me gets the message soldier
Iorward", and passes on the message z soldiers Iorward" to the sol
dier behind him. At the same time, each soldier is also getting the
message soldiers backward" Irom the soldier behind them, and
passing it on as - soldiers backward" to the soldier in Iront oI
them. How many soldiers in total: Add the two numbers you re
ceive, plus one Ior yourselI: that is the total number oI soldiers in
line.
The key idea is that every soldier must separately track the
two messages, the Iorwardmessage and backwardmessage, and add
them together only at the end. You never add any soldiers Irom
the backwardmessage you receive to the Iorwardmessage you pass
back. Indeed, the total number oI soldiers is never passed as a mes
sageno one ever says it aloud.
An analogous principle operates in rigorous probabilistic rea
soning about causality. II you learn something about whether it's
raining, Irom some source other than observing the sidewalk to be
wet, this will send a Iorwardmessage Irom |rain] to |sidewalk wet]
and raise our expectation oI the sidewalk being wet. II you observe
the sidewalk to be wet, this sends a backwardmessage to our be
lieI that it is raining, and this message propagates Irom |rain] to all
neighboring nodes except the |sidewalk wet] node. \e count each
piece oI evidence exactly once, no update message ever bounces"
back and Iorth. The exact algorithm may be Iound in ]udea Pearl's
classic Probabilistic Reasoning in Intelligent Systems: etworks
oI Plausible InIerence
o
.
So what went wrong in phlogiston theory: \hen we observe
that Iire is hot, the |Iire] node can send a backwardevidence to the
|phlogiston"] node, leading us to update our belieIs about phlogis
ton. But iI so, we can't count this as a successIul Iorwardprediction
oI phlogiston theory. The message should go in only one direction,
and not bounce back.
o. http://books.google.com/
books?id=k9VsqN24pNYC&dq=&pg=PP1&ots=WR9UGWdOdd&sig=w_Mrax-y4VVwZy5SQ-
GySphNsKMc&prev=http://www.google.com/
search%3Fhl%3Den%26safe%3Doff%26q%3Dpearl%2Bintelligent%2Bsystems%26btnG%3DSearch&sa=X&oi=print&ct=title#PPA143,M1
FAKE CAUSALITY 149
Alas, human beings do not use a rigorous algorithm Ior updating
belieI networks. \e learn about parent nodes Irom observing chil
dren, and predict child nodes Irom belieIs about parents. But we
don't keep rigorously separate books Ior the backwardmessage and
Iorwardmessage. \e just remember that phlogiston is hot, which
causes Iire to be hot. So it seems like phlogiston theory predicts the
hotness oI Iire. Or, worse, it just Ieels like phlogiston makes the fire
hot.
\ntil you notice that no advance predictions are being made, the
nonconstraining causal node is not labeled Iake". It's represented
the same way as any other node in your belieI network. It Ieels like
a Iact, like all the other Iacts you know: Phlogiston makes the fire hot.
A properly designed AI would notice the problem instantly.
This wouldn't even require specialpurpose code, just correct book
keeping oI the belieI network. (Sadly, we humans can't rewrite our
own code, the way a properly designed AI could.)
Speaking oI hindsight bias
" is just the nontechnical way oI

saying that humans do not rigorously separate Iorward and back
ward messages, allowing Iorward messages to be contaminated by
backward ones.
Those who long ago went down the path oI phlogiston were not
trying to be Iools. o scientist deliberately wants to get stuck in a
blind alley. Are there any Iake explanations in your mind: II there
are, I guarantee they're not labeled Iake explanation", so polling
your thoughts Ior the Iake" keyword will not turn them up.
Thanks to hindsight bias
z
, it's also not enough to check how
well your theory predicts" Iacts you already know. You've got to
predict Ior tomorrow, not yesterday. It's the only way a messy hu
man mind can be guaranteed oI sending a pure Iorward message.
z. Page 6, 'Hindsight Levalues Science'.
17. Semantic Stopsigns
And the child asked:

Q: \here did this rock come Irom:
A: I chipped it oII the big boulder, at the center oI the village.
Q: \here did the boulder come Irom:
A: It probably rolled oII the huge mountain that towers over our
village.
Q: \here did the mountain come Irom:
A: The same place as all stone: it is the bones oI Ymir, the primor
dial giant.
Q: \here did the primordial giant, Ymir, come Irom:
A: From the great abyss, Cinnungagap.
Q: \here did the great abyss, Cinnungagap, come Irom:
A: ever ask that question.
Consider the seeming paradox oI the First Cause. Science has
traced events back to the Big Bang, but why did the Big Bang hap
pen: It's all well and good to say that the zero oI time begins at the
Big Bangthat there is nothing beIore the Big Bang in the ordinary
Ilow oI minutes and hours. But saying this presumes our physical
law, which itselI appears highly structured, it calls out Ior explana
tion. \here did the physical laws come Irom: You could say that
we're all a computer simulation, but then the computer simulation
is running on some other world's laws oI physicswhere did those
laws oI physics come Irom:
At this point, some people say, Cod!"
\hat could possibly make anyone, even a highly religious per
son, think this even helped answer the paradox oI the First Cause:
\hy wouldn't you automatically ask, \here did Cod come
Irom:" Saying Cod is uncaused" or Cod created HimselI" leaves
us in exactly the same position as Time began with the Big Bang."
\e just ask why the whole metasystem exists in the Iirst place, or
why some events but not others are allowed to be uncaused.
My purpose here is not to discuss the seeming paradox oI the
First Cause, but to ask why anyone would think Cod!" could resolve
the paradox. Saying Cod!" is a way oI belonging to a tribe
z
, which
. http://lesswrong.com/lw/it/semantic_stopsigns/
z. Page , 'ProIessing and Cheering'.
gives people a motive to say it as oIten as possiblesome people
even say it Ior questions like \hy did this hurricane strike ew
Orleans:" Lven so, you'd hope people would notice that on the
particular puzzle oI the First Cause, saying Cod!" doesn't help. It
doesn't make the paradox seem any less paradoxical even if true.
How could anyone not notice this:
]onathan \allace suggested that Cod!" Iunctions as a semantic
stopsignthat it isn't a propositional assertion, so much as a cog
nitive traIIic signal: do not think past this point. Saying Cod!"
doesn't so much resolve the paradox, as put up a cognitive traIIic
signal to halt the obvious continuation oI the questionandanswer
chain.
OI course youd never do that, being a good and proper atheist,
right: But Cod!" isn't the only semantic stopsign, just the obvious
Iirst example.
The transhuman technologiesmolecular nanotechnology, ad
vanced biotech, genetech, ArtiIicial Intelligence, et ceterapose
tough policy questions. \hat kind oI role, iI any, should a gov
ernment take in supervising a parent's choice oI genes Ior their
child: Could parents deliberately choose genes Ior schizophrenia:
II enhancing a child's intelligence is expensive, should governments
help ensure access, to prevent the emergence oI a cognitive elite:
You can propose various institutions to answer these policy ques
tionsIor example, that private charities should provide Iinancial
aid Ior intelligence enhancementbut the obvious next question is,
\ill this institution be eIIective:" II we rely on product liability
lawsuits to prevent corporations Irom building harmIul nanotech,
will that really work?
I know someone whose answer to every one oI these questions
is Liberal democracy!" That's it. That's his answer. II you ask the
obvious question oI How well have liberal democracies perIormed,
historically, on problems this tricky:" or \hat iI liberal democra
cy does something stupid:" then you're an autocrat, or libertopian,
or otherwise a very very bad person. o one is allowed to question
democracy.
I once called this kind oI thinking the divine right oI democ
racy". But it is more precise to say that Lemocracy!" Iunctioned
Ior him as a semantic stopsign. II anyone had said to him Turn it
over to the CocaCola corporation!", he would have asked the obvi
ous next questions: \hy: \hat will the CocaCola corporation
do about it: \hy should we trust them: Have they done well in
the past on equally tricky problems:"
Or suppose that someone says MexicanAmericans are plotting
to remove all the oxygen in Larth's atmosphere." You'd probably
ask, \hy would they do that? Lon't MexicanAmericans have to
breathe too: Lo MexicanAmericans even Iunction as a uniIied
conspiracy:" II you don't ask these obvious next questions when
someone says, Corporations are plotting to remove Larth's oxy
gen," then Corporations!" Iunctions Ior you as a semantic stopsign.
Be careIul here not to create a new generic counterargument
against things you don't like"Oh, it's just a stopsign!" o word is
a stopsign oI itselI, the question is whether a word has that eIIect
on a particular person. Having strong emotions
about something
doesn't qualiIy it as a stopsign. I'm not exactly Iond oI terrorists
or IearIul oI private property, that doesn't mean Terrorists!" or
Capitalism!" are cognitive traIIic signals unto me. (The word in
telligence" did once have that eIIect on me, though no longer.)
\hat distinguishes a semantic stopsign is failure to consider the obvi-
ous next question.
SEMANTIC STOPSIGNS 153
18. Mysterious Answers to Mysterious
Questions
Imagine looking at your hand, and knowing nothing oI cells, noth

ing oI biochemistry, nothing oI LA. You've learned some anato
my Irom dissection, so you know your hand contains muscles, but
you don't know why muscles move instead oI lying there like clay.
Your hand is just. stuII. and Ior some reason it moves under your
direction. Is this not magic:
The animal body does not act as a thermodynamic
engine . consciousness teaches every individual that
they are, to some extent, subject to the direction oI his
will. It appears thereIore that animated creatures have
the power oI immediately applying to certain moving
particles oI matter within their bodies, Iorces by which
the motions oI these particles are directed to produce
derived mechanical eIIects. The inIluence oI animal or
vegetable liIe on matter is inIinitely beyond the range oI
any scientiIic inquiry hitherto entered on. Its power oI
directing the motions oI moving particles, in the
demonstrated daily miracle oI our human Ireewill, and in
the growth oI generation aIter generation oI plants Irom
a single seed, are inIinitely diIIerent Irom any possible
result oI the Iortuitous concurrence oI atoms. Modern
biologists were coming once more to the acceptance oI
something and that was a vital principle."
Lord Kelvin
This was the theory oI vitalism, that the mysterious diIIerence
between living matter and nonliving matter was explained by an
elan vital or vis vitalis. Llan vital inIused living matter and caused it
to move as consciously directed. Llan vital participated in chemical
transIormations which no mere nonliving particles could under
go\hler's later synthesis oI urea, a component oI urine, was
a major blow to the vitalistic theory because it showed that mere
chemistry could duplicate a product oI biology.
. http://lesswrong.com/lw/iu/mysterious_answers_to_mysterious_questions/
Calling elan vital" an explanation, even a Iake explanation
z
like
phlogiston
, is probably giving it too much credit. It Iunctioned

primarily as a curiositystopper
. You said \hy:" and the answer

was Llan vital!"
\hen you say Llan vital!", it feels like you know why your hand
moves. You have a little causal diagram
,
in your head that says
|Llan vital!"] > |hand moves]. But actually you know nothing you
didn't know beIore. You don't know, say, whether your hand will
generate heat or absorb heat, unless you have observed the Iact
already, iI not, you won't be able to predict it in advance. Your cu
riosity Ieels sated, but it hasn't been Ied. Since you can say \hy:
Llan vital!" to any possible observation, it is equally good at ex
plaining all outcomes, a disguised hypothesis oI maximum entropy,
etcetera.
But the greater lesson lies in the vitalists' reverence Ior the elan
vital, their eagerness to pronounce it a mystery beyond all science.
Meeting the great dragon \nknown, the vitalists did not draw their
swords to do battle, but bowed their necks in submission. They
took pride
6
in their ignorance, made biology into a sacred mystery,
and thereby became loath to relinquish their ignorance
;
when evi
dence came knocking.
The Secret oI LiIe was infinitely beyond the reach of science! ot just
a little beyond, mind you, but infinitely beyond! Lord Kelvin sure did
get a tremendous emotional kick out oI not knowing something.
But ignorance exists in the map, not in the territory. II I am ig
norant about a phenomenon, that is a Iact about my own state oI
mind, not a Iact about the phenomenon itselI. A phenomenon can
seem mysterious to some particular person. There are no phenome
na which are mysterious oI themselves. To worship a phenomenon
because it seems so wonderIully mysterious, is to worship your own
ignorance.
Vitalism shared with phlogiston the error oI encapsulating the
mystery as a substance. Fire was mysterious, and the phlogiston theory
. Page ;, 'Fake Causality'.
. Page ,, 'Semantic Stopsigns'.
,. Page ;, 'Fake Causality'.
6. http://lesswrong.com/lw/h8/tsuyoku_naritai_i_want_to_become_stronger/
;. http://yudkowsky.net/virtues/
MYSTERIOUS ANSWERS TO MYSTERIOUS QUESTIONS 155
encapsulated the mystery in a mysterious substance called phlo
giston". LiIe was a sacred mystery, and vitalism encapsulated the
sacred mystery in a mysterious substance called elan vital". either
answer helped concentrate the model's probability density
8
make
some outcomes easier to explain than others. The explanation" just
wrapped up the question as a small, hard, opaque black ball.
In a comedy written by Moliere, a physician explains the power
oI a soporiIic by saying that it contains a dormitive potency".
Same principle. It is a Iailure oI human psychology that, Iaced with
a mysterious phenomenon, we more readily postulate mysterious
inherent substances than complex underlying processes.
But the deeper Iailure is supposing that an answer can be mys
terious. II a phenomenon Ieels mysterious, that is a Iact about our
state oI knowledge, not a Iact about the phenomenon itselI. The
vitalists saw a mysterious gap in their knowledge, and postulated a
mysterious stuII that plugged the gap. In doing so, they mixed up
the map with the territory. All conIusion and bewilderment exist in
the mind, not in encapsulated substances.
This is the ultimate and Iully general explanation Ior why, again
and again in humanity's history, people are shocked to discover that
an incredibly mysterious question has a nonmysterious answer.
Mystery is a property oI questions, not answers.
ThereIore I call theories such as vitalism mysterious answers to
mysterious questions.
These are the signs oI mysterious answers to mysterious ques
tions:
First, the explanation acts as a curiositystopper
j
rather
than an anticipationcontroller
o
.
Second, the hypothesis has no moving partsthe model is
not a speciIic complex mechanism, but a blankly solid
substance or Iorce. The mysterious substance or
mysterious Iorce may be said to be here or there, to cause
this or that, but the reason why the mysterious Iorce

behaves thus is wrapped in a blank unity.
8. Page j, 'Focus Your \ncertainty'.
j. Page ,, 'Semantic Stopsigns'.
o. Page o,, 'Making BelieIs Pay Rent (in Anticipated Lxperiences)'.
Third, those who proIIer the explanation cherish their
ignorance
z
, they speak proudly oI how the phenomenon
deIeats ordinary science or is unlike merely mundane
phenomena.
Fourth, even after the answer is given, the phenomenon is still a
mystery and possesses the same quality oI wonderIul
inexplicability that it had at the start.
z. http://yudkowsky.net/virtues/
MYSTERIOUS ANSWERS TO MYSTERIOUS QUESTIONS 157
19. The Futility of Emergence
Prerequisites: BelieI in BelieI

z
, Fake Lxplanations
, Fake Causal
ity
, Mysterious Answers to Mysterious Questions

,
The Iailures oI phlogiston
6
and vitalism
;
are historical
8
hind
sight
j
. Lare I step out on a limb, and name some current theory
which I deem analogously Ilawed:
I name emergence or emergent phenomenausually deIined as the
study oI systems whose highlevel behaviors arise or emerge" Irom
the interaction oI many lowlevel elements. (\ikipedia
o
: The
way complex systems and patterns arise out oI a multiplicity oI rel
atively simple interactions".) Taken literally, that description Iits
every phenomenon in our universe above the level oI individual
quarks, which is part oI the problem. Imagine pointing to a market
crash and saying It's not a quark!" Loes that Ieel like an expla
nation: o: Then neither should saying It's an emergent phe
nomenon!"
It's the noun emergence" that I protest, rather than the verb
emerges Irom". There's nothing wrong with saying X emerges
Irom Y", where Y is some speciIic, detailed model with internal
moving parts. Arises Irom" is another legitimate phrase that
means exactly the same thing: Cravity arises Irom the curvature oI
spacetime, according to the speciIic mathematical model oI Cen
eral Relativity. Chemistry arises Irom interactions between atoms,
according to the speciIic model oI quantum electrodynamics.
ow suppose I should say that gravity is explained by arisence"
or that chemistry is an arising phenomenon", and claim that as my
explanation.
. http://lesswrong.com/lw/iv/the_futility_of_emergence/
. Page 8, 'Fake Lxplanations'.
,. Page ,, 'Mysterious Answers to Mysterious Questions'.
6. Page ;, 'Fake Causality'.
;. Page ,, 'Mysterious Answers to Mysterious Questions'.
8. Page 6, 'Hindsight Levalues Science'.
j. Page , 'Hindsight bias'.
o. http://en.wikipedia.org/wiki/Emergence
The phrase emerges Irom" is acceptable, just like arises Irom"
or is caused by" are acceptable, iI the phrase precedes some speciI
ic model to be judged on its own merits.
However, this is not the way emergence" is commonly used.
Lmergence" is commonly used as an explanation in its own right.
I have lost track oI how many times I have heard people say,
Intelligence is an emergent phenomenon!" as iI that explained in
telligence. This usage Iits all the checklist items Ior a mysterious
answer to a mysterious question
. \hat do you know, aIter you

have said that intelligence is emergent": You can make no new
predictions. You do not know anything about the behavior oI real
world minds that you did not know beIore. It Ieels like you believe
a new Iact, but you don't anticipate any diIIerent outcomes. Your
curiosity Ieels sated, but it has not been Ied. The hypothesis has
no moving partsthere's no detailed internal model to manipulate.
Those who proIIer the hypothesis oI emergence" conIess their ig
norance oI the internals, and take pride in it, they contrast the
science oI emergence" to other sciences merely mundane.
And even aIter the answer oI \hy: Lmergence!" is given, the
phenomenon is still a mystery and possesses the same sacred impene
trability it had at the start.
A Iun exercise is to eliminate the adjective emergent" Irom any
sentence in which it appears, and see iI the sentence says anything
diIIerent:
Before: Human intelligence is an emergent product oI
neurons Iiring.
After: Human intelligence is a product oI neurons Iiring.
Before: The behavior oI the ant colony is the emergent
outcome oI the interactions oI many individual ants.
After: The behavior oI the ant colony is the outcome oI
the interactions oI many individual ants.
Even better: A colony is made oI ants. \e can successIully
predict some aspects oI colony behavior using models that
include only individual ants, without any global colony
variables, showing that we understand how those colony
behaviors arise Irom ant behaviors.
THE FUTILITY OF EMERGENCE 159
Another Iun exercise is to replace the word emergent" with the
old word
z
, the explanation
that people had to use beIore emer

gence was invented:
Before: LiIe is an emergent phenomenon.
After: LiIe is a magical phenomenon.
Before: Human intelligence is an emergent product oI
neurons Iiring.
After: Human intelligence is a magical product oI neurons
Iiring.
Loes not each statement convey exactly the same amount oI
knowledge about the phenomenon's behavior: Loes not each hy
pothesis Iit exactly the same set oI outcomes
:
Lmergence" has become very popular, just as saying magic"
used to be very popular. Lmergence" has the same deep appeal to
human psychology, Ior the same reason. Lmergence" is such a won
derIully easy explanation, and it Ieels good to say it, it gives you
a sacred mystery
,
to worship. Lmergence is popular because it is
the junk Iood oI curiosity. You can explain anything using emer
gence, and so people do just that, Ior it Ieels so wonderIul to explain
things. Humans are still humans, even iI they've taken a Iew science
classes in college. Once they Iind a way to escape the shackles
6
oI
settled science, they get up to the same shenanigans as their ances
tors, dressed up in the literary genre oI science"
;
but still the same
species psychology.
z. Page o, 'Cuessing the Teacher's Password'.
. Page 8, 'Fake Lxplanations'.
,. Page ,, 'Mysterious Answers to Mysterious Questions'.
6. http://yudkowsky.net/virtues/
;. Page ;, 'BelieI as Attire'.
20. Say Not Complexity
Once upon a time.

This is a story Irom when I Iirst met Marcello, with whom I
would later work Ior a year on AI theory, but at this point I had not
yet accepted him as my apprentice. I knew that he competed at
the national level in mathematical and computing olympiads, which
suIIiced to attract my attention Ior a closer look, but I didn't know
yet iI he could learn to think about AI.
I had asked Marcello to say how he thought an AI might dis
cover how to solve a Rubik's Cube. ot in a preprogrammed way,
which is trivial, but rather how the AI itselI might Iigure out the
laws oI the Rubik universe and reason out how to exploit them.
How would an AI invent for itself the concept oI an operator", or
macro", which is the key to solving the Rubik's Cube:
At some point in this discussion, Marcello said: \ell, I think
the AI needs complexity to do X, and complexity to do Y"
And I said, Lon't say 'complexity."
Marcello said, \hy not:"
I said, Complexity should never be a goal in itselI. You may
need to use a particular algorithm that adds some amount oI com
plexity, but complexity Ior the sake oI complexity just makes things
harder." (I was thinking oI all the people whom I had heard advo
cating that the Internet would wake up" and become an AI when
it became suIIiciently complex".)
And Marcello said, But there's got to be some amount oI com
plexity that does it."
I closed my eyes brieIly, and tried to think oI how to explain it
all in words. To me, saying 'complexity' simply felt like the wrong
move in the AI dance. o one can think Iast enough to deliberate,
in words, about each sentence oI their stream oI consciousness, Ior
that would require an inIinite recursion. \e think in words, but
our stream oI consciousness is steered below the level oI words, by
the trainedin remnants oI past insights and harsh experience.
I said, Lid you read A Technical Lxplanation oI Technical Lx
planation
z
:"
. http://lesswrong.com/lw/ix/say_not_complexity/
z. http://yudkowsky.net/bayes/technical.html
Yes," said Marcello.
Okay," I said, saying 'complexity' doesn't concentrate your
probability mass."
Oh," Marcello said, like 'emergence
'. Huh. So. now I've got

to think about how X might actually happen."
That was when I thought to myselI, Maybe this one is teachable."
Complexity is not a useless concept. It has mathematical deIi
nitions attached to it, such as Kolmogorov complexity, and Vapnik
Chervonenkis complexity. Lven on an intuitive level, complexity is
oIten worth thinking aboutyou have to judge the complexity oI a
hypothesis and decide iI it's too complicated" given the supporting
evidence, or look at a design and try to make it simpler.
But concepts are not useIul or useless oI themselves. Only usages
are correct or incorrect. In the step Marcello was trying to take in
the dance, he was trying to explain something Ior Iree, get some
thing Ior nothing. It is an extremely common misstep, at least in
my Iield. You can join a discussion on ArtiIicial Ceneral Intelli
gence and watch people doing the same thing, leIt and right, over
and over againconstantly skipping over things they don't under
stand, without realizing that's what they're doing.
In an eyeblink it happens: putting a noncontrolling causal
node
behind something mysterious, a causal node that Ieels like an

explanation
,
but isn't. The mistake takes place below the level oI
words. It requires no special character Ilaw, it is how human beings
think by deIault
6
, since the ancient times.
\hat you must avoid is skipping over the mysterious part; you must
linger at the mystery to conIront it directly. There are many words
that can skip over mysteries, and some oI them would be legitimate
in other contexts"complexity", Ior example. But the essential
mistake is that skip-over, regardless oI what causal node goes behind
it. The skipover is not a thought, but a microthought. You have
to pay close attention to catch yourselI at it. And when you train
yourselI to avoid skipping, it will become a matter oI instinct, not
verbal reasoning. You have to feel which parts oI your map are still
blank, and more importantly, pay attention to that Ieeling.
. Page ,8, 'The Futility oI Lmergence'.
6. Page o, 'Cuessing the Teacher's Password'.
I suspect that in academia there is a huge pressure to sweep
problems under the rug so that you can present a paper with the ap
pearance oI completeness. You'll get more kudos Ior a seemingly
complete model that includes some emergent phenomena
;
, versus
an explicitly incomplete map where the label says I got no clue how
this part works" or then a miracle occurs". A journal may not even
accept the latter paper, since who knows but that the unknown
steps are really where everything interesting happens: And yes, it
sometimes happens that all the nonmagical parts oI your map turn
out to also be nonimportant. That's the price you sometimes pay,
Ior entering into terra incognita and trying to solve problems incre-
mentally. But that makes it even more important to know when you
aren't Iinished yet. Mostly, people don't dare to enter terra incog
nita at all, Ior the deadly Iear oI wasting their time.
And iI you're working on a revolutionary AI startup, there is an
even huger pressure to sweep problems under the rug, or you will
have to admit to yourselI
8
that you don't know how to build an AI
yet, and your current liIeplans will come crashing down in ruins
around your ears. But perhaps I am overexplaining
j
, since skip
over happens by deIault
o
in humans, iI you're looking Ior examples,
just watch people discussing religion or philosophy or spirituality or
any science in which they were not proIessionally trained.
Marcello and I developed a convention in our AI work: when
we ran into something we didn't understand, which was oIten, we
would say magic"as in, X magically does Y"to remind our
selves that here was an unsolved problem, a gap in our understanding.
It is Iar better to say magic", than complexity" or emergence",
the latter words
create an illusion oI understanding. \iser to say

magic", and leave yourselI a placeholder, a reminder oI work you
will have to do later.
;. Page ,8, 'The Futility oI Lmergence'.
8. Page 6j, 'You Can Face Reality'.
j. Page z6, 'Correspondence Bias'.
o. Page o, 'Cuessing the Teacher's Password'.
SAY NOT 'COMPLEXITY 163
21. Positive Bias: Look Into the Dark
I am teaching a class, and I write upon the blackboard three num

bers: z6. I am thinking oI a rule," I say, which governs se
quences oI three numbers. The sequence z6, as it so happens,
obeys this rule. Lach oI you will Iind, on your desk, a pile oI index
cards. \rite down a sequence oI three numbers on a card, and I'll
mark it Yes" Ior Iits the rule, or o" Ior not Iitting the rule. Then
you can write down another set oI three numbers and ask whether
it Iits again, and so on. \hen you're conIident that you know the
rule, write down the rule on a card. You can test as many triplets as
you like."
Here's the record oI one student's guesses:
, 6, z o
, 6, 8 Yes
o, z, Yes
At this point the student wrote down his guess at the rule.
\hat do you think the rule is: \ould you have wanted to test an
other triplet, and iI so, what would it be: Take a moment to think
beIore continuing.
The challenge above is based on a classic experiment due to
Peter \ason, the z6 task. Although subjects given this task typ
ically expressed high conIidence in their guesses, only z% oI the
subjects successIully guessed the experimenter's real rule, and repli
cations since then have continued to show success rates oI around
zo%.
The study was called On the Iailure to eliminate hypotheses in
a conceptual task" (Quarterly Journal of Experimental Psychology, z:
zjo, j6o). Subjects who attempt the z6 task usually try to
generate positive examples, rather than negative examplesthey ap
ply the hypothetical rule to generate a representative instance, and
see iI it is labeled Yes".
Thus, someone who Iorms the hypothesis numbers increasing
by two" will test the triplet 8oz, hear that it Iits, and conIidently
announce the rule. Someone who Iorms the hypothesis XzXX
will test the triplet 6j, discover that it Iits, and then announce
that rule.
. http://lesswrong.com/lw/iw/positive_bias_look_into_the_dark/
In every case the actual rule is the same: the three numbers must
be in ascending order.
But to discover this, you would have to generate triplets that
shouldnt Iit, such as zozz6, and see iI they are labeled o".
\hich people tend not to do, in this experiment. In some cases,
subjects devise, test", and announce rules Iar more complicated
than the actual answer.
This cognitive phenomenon is usually lumped in with conIir
mation bias". However, it seems to me that the phenomenon oI
trying to test positive rather than negative examples, ought to be dis
tinguished Irom the phenomenon oI trying to preserve the belieI
you started with. Positive bias" is sometimes used as a synonym
Ior conIirmation bias", and Iits this particular Ilaw much better.
It once seemed that phlogiston theory
z
could explain a Ilame
going out in an enclosed box (the air became saturated with phlo
giston and no more could be released), but phlogiston theory could
just as well have explained the Ilame not going out. To notice this,
you have to search Ior negative examples instead oI positive exam
ples, look into zero instead oI one, which goes against the grain oI
what experiment has shown to be human instinct.
For by instinct, we human beings only live in halI the world.
One may be lectured on positive bias Ior days, and yet overlook
it inthemoment. Positive bias is not something we do as a matter
oI logic, or even as a matter oI emotional attachment. The z6
task is cold", logical, not aIIectively hot". And yet the mistake
is subverbal, on the level oI imagery, oI instinctive reactions. Be
cause the problem doesn't arise Irom Iollowing a deliberate rule that
says Only think about positive examples", it can't be solved just by
knowing verbally that \e ought to think about both positive and
negative examples." \hich example automatically pops into your
head: You have to learn, wordlessly, to zag instead oI zig. You
have to learn to Ilinch toward the zero, instead oI away Irom it.
I have been writing Ior quite some time now on the notion that
the strength oI a hypothesis is what it cant explain, not what it
can
iI you are equally good at explaining any outcome, you have

zero knowledge. So to spot an explanation that isn't helpIul, it's not
z. Page ;, 'Fake Causality'.
POSITIVE BIAS: LOOK INTO THE DARK 165
enough to think oI what it does explain very wellyou also have to
search Ior results it couldnt explain, and this is the true strength oI
the theory.
So I said all this, and then yesterday, I challenged the useIulness
oI emergence" as a concept
. One commenter cited superconduc

tivity and Ierromagnetism as examples oI emergence. I replied that
nonsuperconductivity and nonIerromagnetism were also examples
oI emergence, which was the problem. But be it Iar Irom me to
criticize the commenter! Lespite having read extensively on con
Iirmation bias", I didn't spot the gotcha" in the z6 task the Iirst
time I read about it. It's a subverbal blinkreaction that has to be
retrained. I'm still working on it myselI.
So much oI a rationalist's skill is below the level oI words. It
makes Ior challenging work in trying to convey the Art through blog
posts. People will agree with you, but then, in the next sentence,
do something subdeliberative that goes in the opposite direction.
ot that I'm complaining! A major reason I'm posting here is to
observe what my words havent conveyed.
Are you searching Ior positive examples oI positive bias right
now, or sparing a Iraction oI your search on what positive bias
should lead you to not see: Lid you look toward light or darkness:
. Page ,8, 'The Futility oI Lmergence'.
22. My Wild and Reckless Youth
It is said that parents do all the things they tell their children not to
do, which is how they know not to do them.
Long ago, in the unthinkably distant past, I was a devoted Tra
ditional Rationalist, conceiving myselI skilled according to that
kind, yet I knew not the \ay oI Bayes. \hen the young Lliezer
was conIronted with a mysteriousseeming question, the precepts
oI Traditional Rationality did not stop him Irom devising a Myste
rious Answer
z
. It is, by Iar, the most embarrassing mistake I made
in my liIe, and I still wince to think oI it.
\hat was my mysterious answer to a mysterious question: This
I will not describe, Ior it would be a long tale and complicated. I
was young, and a mere Traditional Rationalist who knew not the
teachings oI Tversky and Kahneman. I knew about Occam's Ra
zor, but not the conjunction Iallacy
. I thought I could get away

with thinking complicated thoughts myselI, in the literary style oI
the complicated thoughts I read in science books, not realizing that
correct complexity is only possible when every step is pinned down
overwhelmingly. Today, one oI the chieI pieces oI advice I give to
aspiring young rationalists is Lo not attempt long chains oI rea
soning or complicated plans."
othing more than this need be said: Lven aIter I invented my
answer", the phenomenon was still a mystery
unto me, and pos

sessed the same quality oI wondrous impenetrability that it had at
the start.
Make no mistake
,
, that younger Lliezer was not stupid. All the
errors oI which the young Lliezer was guilty, are still being made
today by respected scientists in respected journals. It would have
taken a subtler skill to protect him, than ever he was taught as a
Traditional Rationalist.
Indeed, the young Lliezer diligently and painstakingly Iollowed
the injunctions oI Traditional Rationality in the course oI going
astray.
. http://lesswrong.com/lw/iy/my_wild_and_reckless_youth/
z. Page ,, 'Mysterious Answers to Mysterious Questions'.
. http://en.wikipedia.org/wiki/Conjunction_fallacy
,. Page z6, 'Correspondence Bias'.
As a Traditional Rationalist, the young Lliezer was careIul to
ensure that his Mysterious Answer made a bold prediction oI Iuture
experience. amely, I expected Iuture neurologists to discover
that neurons were exploiting quantum gravity, a la Sir Roger Pen
rose. This required neurons to maintain a certain degree oI quan
tum coherence, which was something you could look Ior, and Iind
or not Iind. Lither you observe that or you don't, right:
But my hypothesis made no retrospective predictions. According
to Traditional Science, retrospective predictions don't countso
why bother making them: To a Bayesian, on the other hand, iI a
hypothesis does not today have a Iavorable likelihood ratio over I
don't know", it raises the question oI why you today believe anything
more complicated than I don't know". But I knew not the \ay
oI Bayes, so I was not thinking about likelihood ratios or Iocusing
probability density. I had Made a FalsiIiable Prediction, was this
not the Law:
As a Traditional Rationalist, the young Lliezer was careIul not
to believe in magic, mysticism, carbon chauvinism, or anything oI
that sort. I proudly proIessed
6
oI my Mysterious Answer, It is just
physics like all the rest oI physics!" As iI you could save magic Irom
being a cognitive isomorph oI magic, by calling
;
it quantum grav
ity. But I knew not the \ay oI Bayes, and did not see the level
8
on which my idea was isomorphic to magic. I gave my allegiance
to physics, but this did not save me, what does probability theory
know oI allegiances: I avoided everything that Traditional Ratio
nality told me was Iorbidden, but what was leIt was still magic.
Beyond a doubt, my allegiance to Traditional Rationality helped
me get out oI the hole I dug myselI into. II I hadn't been a Tra
ditional Rationalist, I would have been completely screwed. But
Traditional Rationality still wasn't enough to get it right. It just led
me into diIIerent mistakes than the ones it had explicitly Iorbidden.
\hen I think about how my younger selI very careIully Iollowed
the rules oI Traditional Rationality in the course oI getting the
answer wrong, it sheds light on the question oI why people who
call themselves rationalists" do not rule the world
j
. You need one
6. Page , 'ProIessing and Cheering'.
;. Page , 'Science as Attire'.
j. Page ,o,, 'Knowing About Biases Can Hurt People'.
whole hell of a lot oI rationality beIore it does anything but lead you
into new and interesting mistakes.
Traditional Rationality is taught as an art, rather than a science,
you read the biography oI Iamous physicists describing the lessons
liIe taught them, and you try to do what they tell you to do. But you
haven't lived their lives, and halI oI what they're trying to describe
is an instinct that has been trained into them.
The way Traditional Rationality is designed, it would have been
acceptable Ior me to spend o years on my silly idea, so long as I suc
ceeded in IalsiIying it eventually, and was honest with myselI about
what my theory predicted, and accepted the disprooI when it ar
rived, et cetera. This is enough to let the Ratchet oI Science click
Iorward, but it's a little harsh on the people who waste o years oI
their lives. Traditional Rationality is a walk, not a dance. It's de
signed to get you to the truth eventually, and gives you all too much
time to smell the Ilowers along the way.
Traditional Rationalists can agree to disagree. Traditional Ra
tionality doesn't have the ideal that thinking is an exact art in which
there is only one correct probability estimate given the evidence.
In Traditional Rationality, you're allowed to guess, and then test
your guess. But experience has taught me that iI you don't know,
and you guess, you'll end up being wrong.
The \ay oI Bayes is also an imprecise art, at least the way
I'm holding Iorth upon it. These blog posts are still Iumbling at
tempts to put into words lessons that would be better taught by
experience. But at least there's underlying math, plus experimen
tal evidence Irom cognitive psychology on how humans actually
think. Maybe that will be enough to cross the stratospherically
high threshold required Ior a discipline that lets you actually get it
right, instead oI just constraining you into interesting new mistakes.
MY WILD AND RECKLESS YOUTH 169
23. Failing to Learn from History
Continuation of: My \ild and Reckless Youth

z
Once upon a time, in my wild and reckless youth
, when I knew
not the \ay oI Bayes, I gave a Mysterious Answer
to a mysterious
seeming question. Many Iailures occurred in sequence, but one
mistake stands out as most critical: My younger selI did not real
ize that solving a mystery should make it feel less confusing. I was trying
to explain a Mysterious Phenomenonwhich to me meant pro
viding a cause Ior it, Iitting it into an integrated model oI reality.
\hy should this make the phenomenon less Mysterious, when that
is its nature: I was trying to explain the Mysterious Phenomenon,
not render it (by some impossible alchemy) into a mundane phe
nomenon, a phenomenon that wouldn't even call out Ior an unusual
explanation in the Iirst place.
As a Traditional Rationalist, I knew the historical tales oI as
trologers and astronomy, oI alchemists and chemistry, oI vitalists
and biology. But the Mysterious Phenomenon was not like this.
It was something new, something stranger, something more diI
Iicult, something that ordinary science had Iailed to explain Ior
centuries
as iI stars and matter and liIe had not been mysteries Ior hun
dreds oI years and thousands oI years, Irom the dawn oI human
thought right up until science Iinally solved them
\e learn about astronomy and chemistry and biology in school,
and it seems to us that these matters have always been the proper
realm oI science, that they have never been mysterious. \hen sci
ence dares to challenge a new Creat Puzzle, the children oI that
generation are skeptical, Ior they have never seen science explain
something that feels mysterious to them. Science is only good Ior
explaining scientific subjects, like stars and matter and liIe.
I thought the lesson oI history was that astrologers and al
chemists and vitalists had an innate character Ilaw
,
, a tendency
. http://lesswrong.com/lw/iz/failing_to_learn_from_history/
z. Page 6;, 'My \ild and Reckless Youth'.
. Page 6;, 'My \ild and Reckless Youth'.
,. Page z6, 'Correspondence Bias'.
toward mysterianism, which led them to come up with mysterious
explanations Ior nonmysterious subjects. But surely, iI a phe
nomenon really was very weird, a weird explanation might be in
order:
It was only aIterward, when I began to see the mundane struc
ture inside the mystery, that I realized whose shoes I was standing
in. Only then did I realize how reasonable vitalism had seemed at
the time, how surprising and embarrassing had been the universe's re
ply oI, LiIe is mundane, and does not need a weird explanation."
\e read history but we don't live it, we don't experience it.
II only I had personally postulated astrological mysteries and then
discovered ewtonian mechanics, postulated alchemical mysteries
and then discovered chemistry, postulated vitalistic mysteries and
then discovered biology. I would have thought oI my Mysterious
Answer and said to myselI: No way am I falling for that again.
FAILING TO LEARN FROM HISTORY 171
24. Making History Available
Followup to: Failing to Learn Irom History

z
There is a habit oI thought which I call the logical fallacy of gener-
alization from fictional evidence, which deserves a blog post in its own
right, one oI these days. ]ournalists who, Ior example, talk about
the Terminator movies in a report on AI, do not usually treat Ter-
minator as a prophecy or Iixed truth. But the movie is recalledis
available
as iI it were an illustrative historical case. As iI the jour

nalist had seen it happen on some other planet, so that it might well
happen here. More on this in Section 6 oI this paper
.
There is an inverse error to generalizing Irom Iictional evidence:
Iailing to be suIIiciently moved by historical evidence. The trouble
with generalizing Irom Iictional evidence is that it is Iictionit nev
er actually happened. It's not drawn Irom the same distribution
as this, our real universe, Iiction diIIers Irom reality in systematic
ways
,
. But history has happened, and should be available.
In our ancestral environment, there were no movies, what you
saw with your own eyes was true. Is it any wonder that Iictions we
see in liIelike moving pictures have too great an impact on us: Con
versely, things that really happened, we encounter as ink on paper,
they happened, but we never saw them happen. \e don't remem
ber them happening to us.
The inverse error is to treat history as mere story, process it
with the same part oI your mind that handles the novels you read.
You may say with your lips that it is truth", rather than Iiction",
but that doesn't mean you are being moved as much as you should
be. Many biases involve being insuIIiciently moved by dry, abstract
inIormation
6
.
Once upon a time, I gave a Mysterious Answer
;
to a mysterious
question, not realizing that I was making exactly the same mistake
as astrologers devising mystical explanations Ior the stars, or al
. http://lesswrong.com/lw/j0/making_history_available/
z. Page ;o, 'Failing to Learn Irom History'.
. http://en.wikipedia.org/wiki/Availability_heuristic
. http://singinst.org/Biases.pdf
,. http://www.overcomingbias.com/2007/07/tell-your-anti-.html
6. http://lesswrong.com/lw/hw/scope_insensitivity/
;. Page ,, 'Mysterious Answers to Mysterious Questions'.
chemists devising magical properties oI matter, or vitalists postulat
ing an opaque elan vital" to explain all oI biology.
\hen I Iinally realized whose shoes I was standing in
8
, there
was a sudden shock oI unexpected connection with the past. I
realized that the invention and destruction oI vitalismwhich I
had only read about in bookshad actually happened to real people,
who experienced it much the same way I experienced the invention
and destruction oI my own mysterious answer. And I also realized
that iI I had actually experienced the pastiI I had lived through
past scientiIic revolutions myselI, rather than reading about them in
history booksI probably would not have made the same mistake
again. I would not have come up with another mysterious answer,
the Iirst thousand lessons would have hammered home the moral.
So (I thought), to Ieel suIIiciently the Iorce oI history, I should
try to approximate the thoughts oI an Lliezer who had lived through
historyI should try to think as iI everything I read about in
history books, had actually happened to me. (\ith appropriate
reweighting Ior the availability bias oI history booksI should
remember being a thousand peasants Ior every ruler.) I should im
merse myselI in history, imagine living through eras I only saw as
ink on paper.
\hy should I remember the \right Brothers' Iirst Ilight: I
was not there. But as a rationalist, could I dare to not remember,
when the event actually happened: Is there so much diIIerence be
tween seeing an event through your eyeswhich is actually a causal
chain involving reIlected photons, not a direct connectionand
seeing an event through a history book: Photons and history books
both descend by causal chains Irom the event itselI.
I had to overcome the Ialse amnesia oI being born at a particular
time. I had to recallmake available
j
all the memories, not just
the memories which, by mere coincidence, belonged to myselI and
my own era.
The Larth became older, oI a sudden.
To my Iormer memory, the \nited States had always exist
edthere was never a time when there was no \nited States. I had
not remembered, until that time, how the Roman Lmpire rose, and
8. Page ;o, 'Failing to Learn Irom History'.
j. http://en.wikipedia.org/wiki/Availability_heuristic
MAKING HISTORY AVAILABLE 173
brought peace and order, and lasted through so many centuries, un
til I Iorgot that things had ever been otherwise, and yet the Lmpire
Iell, and barbarians overran my city, and the learning that I had pos
sessed was lost. The modern world became more Iragile to my eyes,
it was not the Iirst modern world.
So many mistakes, made over and over and over again, because I
did not remember making them, in every era I never lived.
And to think, people sometimes wonder iI overcoming bias is
important.
Lon't you remember how many times your biases have killed
you: You don't: I've noticed that sudden amnesia oIten Iollows
a Iatal mistake. But take it Irom me, it happened. I remember, I
wasn't there.
So the next time you doubt the strangeness oI the Iuture, re
member how you were born in a huntergatherer tribe ten thousand
years ago, when no one knew oI Science at all. Remember how you
were shocked, to the depths oI your being, when Science explained
the great and terrible sacred mysteries that you once revered so
highly. Remember how you once believed that you could Ily by
eating the right mushrooms, and then you accepted with disap
pointment that you would never Ily, and then you Ilew. Remember
how you had always thought that slavery was right and proper, and
then you changed your mind. Lon't imagine how you could have
predicted the change
o
, Ior that is amnesia. Remember that, in Iact,
you did not guess. Remember how, century aIter century, the world
changed in ways you did not guess.
Maybe then you will be less shocked by what happens next.
o. Page , 'Hindsight bias'.
25. Explain/Worship/Ignore?
Followup to: Semantic Stopsigns

z
, Mysterious Answers to Myste
rious Questions
As our tribe wanders through the grasslands, searching Ior Iruit

trees and prey, it happens every now and then that water pours
down Irom the sky.
\hy does water sometimes Iall Irom the sky:" I ask the beard
ed wise man oI our tribe.
He thinks Ior a moment, this question having never occurred to
him beIore, and then says, From time to time, the sky spirits bat
tle, and when they do, their blood drips Irom the sky."
\here do the sky spirits come Irom:" I ask.
His voice drops to a whisper. From the beIore time. From the
long long ago."
\hen it rains, and you don't know why, you have several op
tions. First, you could simply not ask whynot Iollow up on the
question, or never think oI the question in the Iirst place. This is
the Ignore command, which the bearded wise man originally se
lected. Second, you could try to devise some sort oI explanation,
the Lxplain command, as the bearded man did in response to your
Iirst question. Third, you could enjoy the sensation oI mysterious
nessthe \orship command.
ow, as you are bound to notice Irom this story, each time you
select Lxplain, the bestcase scenario is that you get an explanation,
such as sky spirits". But then this explanation itselI is subject to
the same dilemmaLxplain, \orship, or Ignore: Lach time you
hit Lxplain, science grinds Ior a while, returns an explanation, and
then another dialog box pops up. As good rationalists, we Ieel duty
bound to keep hitting Lxplain, but it seems like a road that has no
end.
You hit Lxplain Ior liIe, and get chemistry, you hit Lxplain
Ior chemistry, and get atoms, you hit Lxplain Ior atoms, and get
electrons and nuclei, you hit Lxplain Ior nuclei, and get quantum
. http://lesswrong.com/lw/j2/explainworshipignore/
z. Page ,, 'Semantic Stopsigns'.
chromodynamics and quarks, you hit Lxplain Ior how the quarks
got there, and get back the Big Bang.
\e can hit Lxplain Ior the Big Bang, and wait while science
grinds through its process, and maybe someday it will return a per
Iectly good explanation. But then that will just bring up another
dialog box. So, iI we continue long enough, we must come to a
special dialog box, a new option, an Lxplanation That eeds o Lx
planation, a place where the chain endsand this, maybe, is the
only explanation worth knowing.
ThereI just hit \orship.
ever Iorget that there are many more ways to worship some
thing than lighting candles around an altar.
II I'd said, Huh, that does seem paradoxical. I wonder how the
apparent paradox is resolved:" then I would have hit Lxplain, which
does sometimes take a while to produce an answer.
And iI the whole issue seems to you unimportant, or irrelevant,
or iI you'd rather put oII thinking about it until tomorrow, than you
have hit Ignore.
Select your option wisely.
26. Science as Curiosity-Stopper

z
, Mysterious Answers to Myste
rious Questions
, Say ot 'Complexity'
Imagine that I, in Iull view oI live television cameras, raised

my hands and chanted abracadabra and caused a brilliant light to
be born, Ilaring in empty space beyond my outstretched hands.
Imagine that I committed this act oI blatant, unmistakeable sorcery
under the Iull supervision oI ]ames Randi and all skeptical armies.
Most people, I think, would be fairly curious as to what was going
on.
But now suppose instead that I don't go on television. I do not
wish to share the power, nor the truth behind it. I want to keep my
sorcery secret. And yet I also want to cast my spells whenever and
wherever I please. I want to cast my brilliant Ilare oI light so that
I can read a book on the trainwithout anyone becoming curious.
Is there a spell that stops curiosity:
Yes indeed! \henever anyone asks How did you do that:", I
just say Science!"
It's not a real explanation
,
, so much as a curiositystopper
6
. It
doesn't tell you whether the light will brighten or Iade, change color
in hue or saturation, and it certainly doesn't tell you how to make a
similar light yourselI. You don't actually know anything more than
you knew beIore I said the magic word
;
. But you turn away, satis
Iied that nothing unusual is going on.
Better yet, the same trick works with a standard light switch.
Flip a switch and a light bulb turns on. \hy:
In school, one is taught that the password
8
to the light bulb is
Llectricity!" By now, I hope, you're wary oI marking the light bulb
understood" on such a basis. Loes saying Llectricity!" let you
do calculations that will control your anticipation oI experience:
. http://lesswrong.com/lw/j3/science_as_curiositystopper/
. Page 6, 'Say ot Complexity"'.
6. Page ,, 'Semantic Stopsigns'.
;. Page o, 'Cuessing the Teacher's Password'.
8. Page o, 'Cuessing the Teacher's Password'.
There is, at the least, a great deal more to learn. (Physicists should
ignore this paragraph and substitute a problem in evolutionary the
ory
j
, where the substance oI the theory is again in calculations that
Iew people know how to perIorm.)
II you thought the light bulb was scientifically inexplicable, it
would seize the entirety oI your attention. You would drop whatev
er else you were doing, and Iocus on that light bulb.
But what does the phrase scientiIically explicable" mean: It
means that someone else knows how the light bulb works. \hen
you are told the light bulb is scientiIically explicable", you don't
know more than you knew earlier, you don't know whether the light
bulb will brighten or Iade. But because someone else knows, it de
values the knowledge in your eyes. You become less curious.
Since this is an econblog, someone out there is bound to say, II
the light bulb were unknown to science, you could gain Iame and
Iortune by investigating it." But I'm not talking about greed. I'm
not talking about career ambition. I'm talking about the raw emo
tion oI curiositythe Ieeling oI being intrigued. \hy should your
curiosity be diminished because someone else, not you, knows how
the light bulb works: Is this not spite: It's not enough Ior you to
know, other people must also be ignorant, or you won't be happy:
There are goods that knowledge may serve besides curiosity,
such as the social utility oI technology. For these instrumental
goods, it matters whether some other entity in local space already
knows. But Ior my own curiosity, why should it matter:
Besides, consider the consequences iI you permit Someone else
knows the answer" to Iunction as a curiositystopper. One day you
walk into your living room and see a giant green elephant, seeming
ly hovering in midair, surrounded by an aura oI silver light.
\hat the heck:" you say.
And a voice comes Irom above the elephant, saying, SOML
BOLY ALRLALY KO\S \HY THIS LLLPHAT IS
HLRL
o
."
Oh," you say, in that case, never mind," and walk on to the
kitchen.
j. http://lesswrong.com/lw/kr/an_alien_god/
o. http://godescalc.wordpress.com/2012/06/24/overlooked-elephant/
I don't know the grand uniIied theory Ior this universe's laws oI
physics. I also don't know much about human anatomy with the
exception oI the brain. I couldn't point out on my body where my
kidneys are, and I can't recall oIIhand what my liver does. (I am not
proud oI this. Alas, with all the math I need to study, I'm not likely
to learn anatomy anytime soon.)
Should I, so Iar as curiosity is concerned, be more intrigued by
my ignorance oI the ultimate laws oI physics, than the Iact that I
don't know much about what goes on inside my own body:
II I raised my hands and cast a light spell, you would be in
trigued. Should you be any less intrigued by the very Iact that
I raised my hands: \hen you raise your arm and wave a hand
around, this act oI will is coordinated by (among other brain areas)
your cerebellum. I bet you don't know how the cerebellum works.
I know a littlethough only the gross details, not enough to per
Iorm calculations. but so what: \hat does that matter, iI you
don't know: \hy should there be a double standard oI curiosity
Ior sorcery and hand motions:
Look at yourselI in the mirror. Lo you know what you're look
ing at: Lo you know what looks out Irom behind your eyes: Lo
you know what you are: Some oI that answer, Science knows, and
some oI it Science does not. But why should that distinction mat
ter to your curiosity, iI you don't know:
Lo you know how your knees work: Lo you know how your
shoes were made: Lo you know why your computer monitor
glows: Lo you know why water is wet:
The world around you is Iull oI puzzles. Prioritize, iI you must.
But do not complain that cruel Science has emptied the world oI
mystery. \ith reasoning such as that, I could get you to overlook
an elephant in your living room.
'SCIENCE AS CURIOSITY-STOPPER 179
27. Applause Lights

z
, \e Lon't Really \ant Your
Participation
At the Singularity Summit zoo;, one oI the speakers called Ior

democratic, multinational development oI AI. So I stepped up to
the microphone and asked:
Suppose that a group oI democratic republics Iorm a
consortium to develop AI, and there's a lot oI politicking
during the processsome interest groups have unusually
large inIluence, others get shaItedin other words, the
result looks just like the products oI modern
democracies. Alternatively, suppose a group oI rebel
nerds develops an AI in their basement, and instructs the
AI to poll everyone in the worlddropping cellphones
to anyone who doesn't have themand do whatever the
majority says. \hich oI these do you think is more
democratic", and would you Ieel saIe with either:
I wanted to Iind out whether he believed in the pragmatic ad
equacy oI the democratic political process, or iI he believed in the
moral rightness oI voting. But the speaker replied:
The Iirst scenario sounds like an editorial in Reason
magazine, and the second sounds like a Hollywood movie
plot.
ConIused, I asked:
Then what kind oI democratic process did you have in
mind:
The speaker replied:
. http://lesswrong.com/lw/jb/applause_lights/
. http://lesswrong.com/lw/ja/we_dont_really_want_your_participation/
Something like the Human Cenome Projectthat was
an internationally sponsored research project.
I asked:
How would diIIerent interest groups resolve their
conIlicts in a structure like the Human Cenome Project:
And the speaker said:
I don't know.
This exchange puts me in mind oI a quote
(which I Iailed to
Coogle Iound by ]eII Crey and Miguel) Irom some dictator or oth
er, who was asked iI he had any intentions to move his pet state
toward democracy:
\e believe we are already within a democratic system.
Some Iactors are still missing, like the expression oI the
people's will.
The substance oI a democracy is the speciIic mechanism that re
solves policy conIlicts. II all groups had the same preIerred policies,
there would be no need Ior democracywe would automatically co
operate. The resolution process can be a direct majority vote, or an
elected legislature, or even a votersensitive behavior oI an AI, but
it has to be something. \hat does it mean to call Ior a democratic"
solution iI you don't have a conIlictresolution mechanism in mind:
I think it means that you have said the word democracy", so
the audience is supposed to cheer. It's not so much a proposition-
al statement, as the equivalent oI the Applause" light that tells a
studio audience when to clap.
This case is remarkable only in that I mistook the applause
light Ior a policy suggestion, with subsequent embarrassment Ior
all. Most applause lights are much more blatant, and can be detect
ed by a simple reversal test. For example, suppose someone says:
\e need to balance the risks and opportunities oI AI.
. http://www.time.com/time/magazine/article/0,9171,954853,00.html
APPLAUSE LIGHTS 181
II you reverse this statement, you get:
\e shouldn't balance the risks and opportunities oI AI.
Since the reversal sounds abnormal, the unreversed statement
is probably normal, implying it does not convey new inIormation.
There are plenty oI legitimate reasons Ior uttering a sentence that
would be uninIormative in isolation. \e need to balance the risks
and opportunities oI AI" can introduce a discussion topic, it can
emphasize the importance oI a speciIic proposal Ior balancing, it
can criticize an unbalanced proposal. Linking to a normal assertion
can convey new inIormation to a bounded rationalistthe link it
selI may not be obvious. But iI no speciIics Iollow, the sentence is
probably an applause light.
I am tempted to give a talk sometime that consists oI nothing but
applause lights, and see how long it takes Ior the audience to start
laughing:
I am here to propose to you today that we need to
balance the risks and opportunities oI advanced ArtiIicial
Intelligence. \e should avoid the risks and, insoIar as it
is possible, realize the opportunities. \e should not
needlessly conIront entirely unnecessary dangers. To
achieve these goals, we must plan wisely and rationally.
\e should not act in Iear and panic, or give in to
technophobia, but neither should we act in blind
enthusiasm. \e should respect the interests oI all
parties with a stake in the Singularity. \e must try to
ensure that the beneIits oI advanced technologies accrue
to as many individuals as possible, rather than being
restricted to a Iew. \e must try to avoid, as much as
possible, violent conIlicts using these technologies, and
we must prevent massive destructive capability Irom
Ialling into the hands oI individuals. \e should think
through these issues beIore, not aIter, it is too late to do
anything about them.
28. Truly Part Of You
Followup to: Cuessing the Teacher's Password

z
, ArtiIicial Addi
tion
A classic paper by Lrew McLermott, ArtiIicial Intelligence

Meets atural Stupidity
, criticized AI programs that would try

to represent notions like happiness is a state of mind using a semantic
network:
STATE-OF-MIND
^
| IS-A
|
HAPPINESS
And oI course there's nothing inside the HAPPILSS" node,
it's just a naked LISP token with a suggestive Lnglish name.
So, McLermott says, A good test Ior the disciplined program
mer is to try using gensyms in key places and see iI he still admires
his system. For example, iI STATLOFMIL is renamed
Co;." then we would have IS-A(HAPPINESS, G1073)
which looks much more dubious."
Or as I would slightly rephrase the idea: II you substituted ran
domized symbols Ior all the suggestive Lnglish names, you would
be completely unable to Iigure out what G1071(G1072, 1073)
meant. \as the AI program meant to represent hamburgers: Ap
ples: Happiness: \ho knows: If you delete the suggestive English
names, they dont grow back.
Suppose a physicist tells you that Light is waves
,
, and you be-
lieve him. You now have a little network in your head that says
ISA(LICHT, \AVLS). II someone asks you \hat is light made
oI:" you'll be able to say \aves!"
As McLermott says, The whole problem is getting the hearer
to notice what it has been told. ot 'understand', but 'notice'."
. http://lesswrong.com/lw/la/truly_part_of_you/
z. Page o, 'Cuessing the Teacher's Password'.
. http://lesswrong.com/lw/l9/artificial_addition/
. http://ontolog.cim3.net/forum/ontolog-forum/2006-04/pdfb9P8wB8aYh.pdf
,. Page o, 'Cuessing the Teacher's Password'.
Suppose that instead the physicist told you, Light is made oI little
curvy things." (ot true, btw.) \ould you notice any diIIerence oI
anticipated experience
6
:
How can you realize that you shouldn't trust your seeming
knowledge that light is waves": One test you could apply is asking,
Could I regenerate this knowledge iI it were somehow deleted Irom
my mind:"
This is similar in spirit to scrambling the names oI suggestively
named LISP tokens in your AI program, and seeing iI someone else
can Iigure out what they allegedly reIer" to. It's also similar in spir
it to observing that while an ArtiIicial Arithmetician
;
can record
and play back Plus-Of(Seven, Six) = Thirteen, it can't re
generate the knowledge iI you delete it Irom memory, until another
human reenters it in the database. ]ust as iI you Iorgot that light
is waves", you couldn't get back the knowledge except the same way
you got the knowledge to begin withby asking a physicist. You
couldn't generate the knowledge Ior yourselI, the way that physi
cists originally generated it.
The same experiences that lead us to Iormulate a belieI, connect
that belieI to other knowledge and sensory input and motor out
put. II you see a beaver chewing a log, then you know what this
thingthatchewsthroughlogs looks like, and you will be able to
recognize it on Iuture occasions whether it is called a beaver" or
not. But iI you acquire your belieIs about beavers by someone else
telling you Iacts about beavers", you may not be able to recognize
a beaver when you see one.
This is the terrible danger oI trying to tell an ArtiIicial Intelli
gence Iacts which it could not learn Ior itselI. It is also the terrible
danger oI trying to tell someone about physics that they cannot ver
iIy Ior themselves. For what physicists mean by wave" is not little
squiggly thing" but a purely mathematical concept.
As Lavidson observes, iI you believe that beavers" live in
deserts, are pure white in color, and weigh oo pounds when adult,
then you do not have any belieIs about beavers, true or Ialse. Your
belieI about beavers" is not right enough to be wrong. II you don't
have enough experience to regenerate belieIs when they are delet
6. Page o,, 'Making BelieIs Pay Rent (in Anticipated Lxperiences)'.
;. http://lesswrong.com/lw/l9/artificial_addition/
ed, then do you have enough experience to connect that belieI to
anything at all: \ittgenstein: A wheel that can be turned though
nothing turns with it, is not part oI the mechanism."
Almost as soon as I started reading about AIeven beIore I
read McLermottI realized it would be a really good idea to always
ask myselI: How would I regenerate this knowledge iI it were
deleted Irom my mind:"
The deeper the deletion, the stricter the test. II all prooIs oI
the Pythagorean Theorem were deleted Irom my mind, could I re
prove it: I think so. II all knowledge oI the Pythagorean Theorem
were deleted Irom my mind, would I notice the Pythagorean The
orem to reprove: That's harder to boast, without putting it to the
test, but iI you handed me a right triangle with sides and , and
told me that the length oI the hypotenuse was calculable, I think I
would be able to calculate it, iI I still knew all the rest oI my math.
\hat about the notion oI mathematical proof? II no one had ever
told it to me, would I be able to reinvent that on the basis oI oth
er belieIs I possess: There was a time when humanity did not have
such a concept. Someone must have invented it. \hat was it that
they noticed: \ould I notice iI I saw something equally novel and
equally important: \ould I be able to think that Iar outside the
box
8
:
How much oI your knowledge could you regenerate: From how
deep a deletion: It's not just a test to cast out insuIIiciently con
nected belieIs. It's a way oI absorbing a fountain of knowledge, not just
one fact.
A shepherd builds a counting system
j
that works by throwing a
pebble into a bucket whenever a sheep leaves the Iold, and taking
a pebble out whenever a sheep returns. II you, the apprentice, do
not understand this systemiI it is magic that works Ior no appar
ent reasonthen you will not know what to do iI you accidentally
drop an extra pebble into the bucket. That which you cannot make
yourselI, you cannot remake when the situation calls Ior it. You can
not go back to the source, tweak one oI the parameter settings, and
regenerate the output, without the source. II Two plus Iour equals
six" is a brute Iact unto you, and then one oI the elements changes
8. Page ,, 'The Outside the Box" Box'.
j. http://sl4.org/wiki/TheSimpleTruth
TRULY PART OF YOU 185
to Iive", how are you to know that two plus Iive equals seven"
when you were simply told that two plus Iour equals six":
II you see a small plant that drops a seed whenever a bird passes
it, it will not occur to you that you can use this plant to partially au
tomate the sheepcounter. Though you learned something that the
original maker would use to improve on his invention, you can't go
back to the source and recreate it.
\hen you contain the source oI a thought, that thought can
change along with you as you acquire new knowledge and new
skills. \hen you contain the source oI a thought, it becomes truly
a part oI you and grows along with you.
Strive to make yourselI the source oI every thought worth think
ing. II the thought originally came Irom outside, make sure it
comes Irom inside as well. Continually ask yourselI: How would I
regenerate the thought iI it were deleted:" \hen you have an an
swer, imagine that knowledge being deleted as well. And when you
Iind a Iountain, see what else it can pour.
29. Chaotic Inversion
I was recently having a conversation with some Iriends on the

topic oI hourbyhour productivity and willpower mainte
nancesomething I've struggled with my whole liIe.
I can avoid running away Irom a hard problem the Iirst time I
see it
z
(perseverance on a timescale oI seconds), and I can stick to
the same problem Ior years, but to keep working on a timescale oI
hours is a constant battle Ior me. It goes without saying that I've al
ready read reams and reams oI advice, and the most help I got Irom
it was realizing that a sizable Iraction other creative proIessionals
had the same problem, and couldn't beat it either, no matter how
reasonable all the advice sounds.
\hat do you do when you can't work:" my Iriends asked me.
(Conversation probably not accurate, this is a very loose gist.)
And I replied that I usually browse random websites, or watch a
short video.
\ell," they said, iI you know you can't work Ior a while, you
should watch a movie or something."
\nIortunately," I replied, I have to do something whose time
comes in short units, like browsing the \eb or watching short
videos, because I might become able to work again at any time, and
I can't predict when"
And then I stopped, because I'd just had a revelation.
I'd always thought oI my workcycle as something chaotic, some
thing unpredictable. I never used those words, but that was the way
I treated it.
But here my Iriends seemed to be implyingwhat a strange
thoughtthat other people could predict when they would become
able to work again, and structure their time accordingly.
And it occurred to me Ior the Iirst time that I might have been
committing that damned old chestnut the Mind Projection Falla
cy
, right out there in my ordinary everyday liIe instead oI high

abstraction.
. http://lesswrong.com/lw/wb/chaotic_inversion/
z. http://lesswrong.com/lw/un/on_doing_the_impossible/
. Page 68,, 'Mind Projection Fallacy'.
Maybe it wasn't that my productivity was unusually chaotic;
maybe I was just unusually stupid with respect to predicting it.
That's what inverted stupidity looks likechaos. Something
hard to handle, hard to grasp, hard to guess, something you can't do
anything with. It's not just an idiom Ior high abstract things like
ArtiIicial Intelligence. It can apply in ordinary liIe too.
And the reason we don't think oI the alternative explanation
I'm stupid", is notI suspectthat we think so highly oI our
selves. It's just that we don't think oI ourselves at all. \e just see
a chaotic Ieature oI the environment
.
So now it's occurred to me that my productivity problem may
not be chaos, but my own stupidity.
And that may or may not help anything. It certainly doesn't Iix
the problem right away. Saying I'm ignorant" doesn't make you
knowledgeable.
But it is, at least, a diIIerent path than saying it's too chaotic".
. Page 68,, 'Mind Projection Fallacy'.
A series on the use and abuse of words; why you often cant
define a word any way you like; how human brains seem
to process definitions. First introduces the Mind
projection Iallacy and the concept of how an algorithm
feels from inside, which makes it a basic intro to key
elements of the LW zeitgeist.
Part III
A Humans Guide to Words
1. The Parable of the Dagger
Once upon a time, there was a court jester who dabbled in logic.
The jester presented the king with two boxes. \pon the Iirst
box was inscribed:
Lither this box contains an angry Irog, or the box with a
Ialse inscription contains an angry Irog, but not both."
On the second box was inscribed:
Lither this box contains gold and the box with a Ialse
inscription contains an angry Irog, or this box contains
an angry Irog and the box with a true inscription contains
gold."
And the jester said to the king: One box contains an angry
Irog, the other box gold, and one, and only one, oI the inscriptions
is true."
The king opened the wrong box, and was savaged by an angry
Irog.
You see," the jester said, let us hypothesize that the Iirst in
scription is the true one. Then suppose the Iirst box contains gold.
Then the other box would have an angry Irog, while the box with a
true inscription would contain gold, which would make the second
statement true as well. ow hypothesize that the Iirst inscription
is Ialse, and that the Iirst box contains gold. Then the second in
scription would be"
The king ordered the jester thrown in the dungeons.
A day later, the jester was brought beIore the king in chains, and
shown two boxes.
One box contains a key," said the king, to unlock your chains,
and iI you Iind the key you are Iree. But the other box contains a
dagger Ior your heart, iI you Iail."
And the Iirst box was inscribed:
. http://lesswrong.com/lw/ne/the_parable_of_the_dagger/
Lither both inscriptions are true, or both inscriptions
are Ialse."
And the second box was inscribed:
This box contains the key."
The jester reasoned thusly: Suppose the Iirst inscription is
true. Then the second inscription must also be true. ow suppose
the Iirst inscription is Ialse. Then again the second inscription
must be true. So the second box must contain the key, iI the Iirst
inscription is true, and also iI the Iirst inscription is Ialse. There
Iore, the second box must logically contain the key."
The jester opened the second box, and Iound a dagger.
How:!" cried the jester in horror, as he was dragged away. It's
logically impossible!"
It is entirely possible," replied the king. I merely wrote those
inscriptions on two boxes, and then I put the dagger in the second
one."
(Adapted from Raymond Smullyan.)
192 A HUMAN`S GUIDE TO WORDS
2. The Parable of Hemlock
Followup to: The Parable oI the Lagger

z
All men are mortal. Socrates is a man. ThereIore
Socrates is mortal."
Aristotle(:)
Socrates raised the glass of hemlock to his lips
Lo you suppose," asked one oI the onlookers, that even hem
lock will not be enough to kill so wise and good a man:"
o," replied another bystander, a student oI philosophy, all
men are mortal, and Socrates is a man, and iI a mortal drink hem
lock, surely he dies."
\ell," said the onlooker, what iI it happens that Socrates isnt
mortal:"
onsense," replied the student, a little sharply, all men are mor
tal by definition, it is part oI what we mean by the word 'man'. All
men are mortal, Socrates is a man, thereIore Socrates is mortal. It
is not merely a guess, but a logical certainty."
I suppose that's right." said the onlooker. Oh, look, Socrates
already drank the hemlock while we were talking."
Yes, he should be keeling over any minute now," said the stu
dent.
And they waited, and they waited, and they waited
Socrates appears not to be mortal," said the onlooker.
Then Socrates must not be a man," replied the student. All
men are mortal, Socrates is not mortal, thereIore Socrates is not a
man. And that is not merely a guess, but a logical certainty."
The Iundamental problem with arguing that things are true by
deIinition" is that you can't make reality go a diIIerent way by
choosing a diIIerent deIinition
.
You could reason, perhaps, as Iollows: All things I have ob
served which wear clothing, speak language, and use tools, have also
shared certain other properties as well, such as breathing air and
pumping red blood. The last thirty 'humans' belonging to this clus
. http://lesswrong.com/lw/nf/the_parable_of_hemlock/
z. Page j, 'The Parable oI the Lagger'.
. http://machall.com/view.php?date=2003-04-21
ter, whom I observed to drink hemlock, soon Iell over and stopped
moving. Socrates wears a toga, speaks Iluent ancient Creek, and
drank hemlock Irom a cup. So I predict that Socrates will keel over
in the next Iive minutes."
But that would be mere guessing. It wouldn't be, y'know, abso
lutely and eternally certain
. The Creek philosopherslike most

prescientiIic philosopherswere rather Iond oI certainty.
Luckily the Creek philosophers have a crushing rejoinder to
your questioning. You have misunderstood the meaning oI All hu
mans are mortal," they say. It is not a mere observation. It is part
oI the definition oI the word human". Mortality is one oI several
properties that are individually necessary, and together suIIicient,
to determine membership in the class human". The statement All
humans are mortal" is a logically valid truth, absolutely unquestion
able. And iI Socrates is human, he must be mortal: it is a logical
deduction, as certain as certain can be.
But then we can never know Ior certain that Socrates is a hu
man" until aIter Socrates has been observed to be mortal. It does
no good to observe that Socrates speaks Iluent Creek, or that
Socrates has red blood, or even that Socrates has human LA.
one oI these characteristics are logically equivalent to mortality.
You have to see him die beIore you can conclude that he was human.
(And even then it's not inIinitely certain
,
. \hat iI Socrates
rises Irom the grave a night aIter you see him die: Or more real
istically, what iI Socrates is signed up Ior cryonics: II mortality is
deIined to mean Iinite liIespan, then you can never really know iI
someone was human, until you've observed to the end oI eterni
tyjust to make sure they don't come back. Or you could think you
saw Socrates keel over, but it could be an illusion projected onto
your eyes with a retinal scanner. Or maybe you just hallucinated the
whole thing.)
The problem with syllogisms is that they're always valid. All
humans are mortal, Socrates is human, thereIore Socrates is mortal"
isiI you treat it as a logical syllogismlogically valid within our
own universe. It's also logically valid within neighboring Lverett
branches in which, due to a slightly diIIerent evolved biochemistry,
. Page 6o, 'Absolute Authority'.
,. Page 6, 'InIinite Certainty'.
hemlock is a delicious treat rather than a poison. And it's logically
valid even in universes where Socrates never existed, or Ior that
matter, where humans never existed.
The Bayesian deIinition
6
oI evidence Iavoring a hypothesis is ev
idence which we are more likely to see iI the hypothesis is true than
iI it is Ialse. Observing that a syllogism is logically valid can never
be evidence Iavoring any empirical proposition, because the syllo
gism will be logically valid whether that proposition is true or Ialse.
Syllogisms are valid in all possible worlds, and thereIore, observ
ing their validity never tells us anything about which possible world
we actually live in.
This doesn't mean that logic is uselessjust that logic can only
tell us that which, in some sense, we already know. But we do not al
ways believe what we know. Is the number zj8zoj prime: By
virtue oI how I deIine my decimal system and my axioms oI arith
metic, I have already determined my answer to this questionbut
I do not know what my answer is yet, and I must do some logic to
Iind out.
Similarly, iI I Iorm the uncertain empirical generalization Hu
mans are vulnerable to hemlock", and the uncertain empirical guess
Socrates is human", logic can tell me that my previous guesses are
predicting that Socrates will be vulnerable to hemlock.
It's been suggested that we can view logical reasoning as resolv
ing our uncertainty about impossible possible worldseliminating
probability mass in logically impossible worlds which we did not
know to be logically impossible. In this sense, logical argument can
be treated as observation
;
.
But when you talk about an empirical prediction like Socrates
is going to keel over and stop breathing" or Socrates is going to do
IiIty jumping jacks and then compete in the Olympics next year",
that is a matter oI possible worlds, not impossible possible worlds.
Logic can tell us which hypotheses match up to which obser
vations, and it can tell us what these hypotheses predict Ior the
Iutureit can bring old observations and previous guesses to bear
on a new problem. But logic never Ilatly says, Socrates will stop
breathing now." Logic never dictates any empirical question, it nev
6. Page z8, 'An Intuitive Lxplanation oI Bayes' Theorem'.
;. Page jz, 'How to Convince Me That z - z '.
THE PARABLE OF HEMLOCK 195
er settles any realworld query which could, by any stretch oI the
imagination, go either way.
]ust remember the Litany Against Logic:
Logic stays true, wherever you may go,
So logic never tells you where you live.
3. Words as Hidden Inferences
Followup to: The Parable oI Hemlock

z
Suppose I Iind a barrel, sealed at the top, but with a hole large
enough Ior a hand. I reach in, and Ieel a small, curved object. I
pull the object out, and it's bluea bluish egg. ext I reach in and
Ieel something hard and Ilat, with edgeswhich, when I extract it,
proves to be a red cube. I pull out eggs and 8 cubes, and every
egg is blue, and every cube is red.
ow I reach in and I Ieel another eggshaped object. BeIore I
pull it out and look, I have to guess: \hat will it look like:
The evidence doesn't prove that every egg in the barrel is blue,
and every cube is red. The evidence doesn't even argue this all that
strongly: j is not a large sample size. onetheless, I'll guess that
this eggshaped object is blueor as a runnerup guess, red. II I
guess anything else, there's as many possibilities as distinguishable
colorsand Ior that matter, who says the egg has to be a single
shade: Maybe it has a picture oI a horse painted on.
So I say blue", with a dutiIul patina oI humility. For I am
a sophisticated rationalisttype person, and I keep track oI my
assumptions and dependenciesI guess, but I'm aware that I'm
guessing. right:
But when a large yellow striped Ielineshaped object leaps out at
me Irom the shadows, I think, Yikes! A tiger!" ot, Hm. ob
jects with the properties oI largeness, yellowness, stripedness, and
Ieline shape, have previously oIten possessed the properties 'hun
gry' and 'dangerous', and thus, although it is not logically necessary,
it may be an empirically good guess that aaauuughhhh CR\CH
CR\CH C\LP."
The human brain, Ior some odd reason, seems to have been
adapted to make this inIerence quickly, automatically, and without
keeping explicit track oI its assumptions.
And iI I name the eggshaped objects bleggs" (Ior blue eggs)
and the red cubes rubes", then, when I reach in and Ieel another
eggshaped object, I may think: Oh, its a blegg, rather than consid
ering all that problemoIinduction stuII.
. http://lesswrong.com/lw/ng/words_as_hidden_inferences/
z. Page j, 'The Parable oI Hemlock'.
It is a common misconception that you can deIine a word any
way you like.
This would be true if the brain treated words as purely logical
constructs, Aristotelian classes, and you never took out any more
inIormation than you put in
.
Yet the brain goes on about its work oI categorization, whether
or not we consciously approve. All humans are mortal, Socrates
is a human, thereIore Socrates is mortal"thus spake the ancient
Creek philosophers. \ell, iI mortality is part oI your logical deI
inition oI human", you can't logically classiIy Socrates as human
until you observe him to be mortal
. Butthis is the prob

lemAristotle knew perIectly well that Socrates was a human.
Aristotle's brain placed Socrates in the human" category as eIIi
ciently as your own brain categorizes tigers, apples, and everything
else in its environment: SwiItly, silently, and without conscious ap
proval.
Aristotle laid down rules under which no one could conclude
Socrates was human" until aIter he died. onetheless, Aristotle
and his students went on concluding that living people were hu
mans and thereIore mortal, they saw distinguishing properties such
as human Iaces and human bodies, and their brains made the leap
to inIerred properties such as mortality.
Misunderstanding the working oI your own mind does not,
thankIully, prevent the mind Irom doing its work. Otherwise Aris
totelians would have starved, unable to conclude that an object was
edible merely because it looked and Ielt like a banana.
So the Aristotelians went on classiIying environmental objects
on the basis oI partial inIormation, the way people had always
done. Students oI Aristotelian logic went on thinking exactly the
same way, but they had acquired an erroneous picture oI what they
were doing.
II you asked an Aristotelian philosopher whether Carol the gro
cer was mortal, they would say Yes." II you asked them how
they knew, they would say All humans are mortal, Carol is human,
thereIore Carol is mortal." Ask them whether it was a guess or a
certainty, and they would say it was a certainty (iI you asked beIore
. Page j, 'The Parable oI Hemlock'.
the sixteenth century, at least). Ask them how they knew that
humans were mortal, and they would say it was established by deIi
nition.
The Aristotelians were still the same people, they retained their
original natures, but they had acquired incorrect belieIs about their
own Iunctioning
,
. They looked into the mirror oI selIawareness,
and saw something unlike their true selves: they reIlected incorrect
ly.
Your brain doesn't treat words as logical deIinitions with no
empirical consequences, and so neither should you. The mere act
oI creating a word can cause your mind to allocate a category, and
thereby trigger unconscious inIerences oI similarity. Or block in
Ierences oI similarity, iI I create two labels
6
I can get your mind to
allocate two categories. otice how I said you" and your brain"
as iI they were diIIerent things:
Making errors about the inside oI your head doesn't change
what's there, otherwise Aristotle would have died when he conclud
ed that the brain was an organ Ior cooling the blood. Philosophical
mistakes usually don't interIere with blinkoIaneye perceptual in
Ierences.
But philosophical mistakes can severely mess up the deliberate
thinking processes that we use to try to correct our Iirst impres
sions. II you believe that you can deIine a word any way you like",
without realizing that your brain goes on categorizing without your
conscious oversight, then you won't take the eIIort to choose your
deIinitions wisely.
,. Page jj, 'The Lens That Sees Its Flaws'.
6. http://lesswrong.com/lw/mg/the_twoparty_swindle/
WORDS AS HIDDEN INFERENCES 199
4. Extensions and Intensions
Followup to: \ords as Hidden InIerences

z
\hat is red:"
Red is a color."
\hat's a color:"
A color is a property oI a thing."
But what is a thing: And what's a property: Soon the two are
lost in a maze oI words deIined in other words, the problem that
Steven Harnad once described
as trying to learn Chinese Irom a

Chinese/Chinese dictionary.
Alternatively, iI you asked me \hat is red:" I could point to
a stop sign, then to someone wearing a red shirt, and a traIIic light
that happens to be red, and blood Irom where I accidentally cut
myselI, and a red business card, and then I could call up a color
wheel on my computer and move the cursor to the red area. This
would probably be suIIicient, though iI you know what the word
o" means, the truly strict
would insist that I point to the sky

and say o."
I think I stole this example Irom S. I. Hayakawathough I'm
really not sure, because I heard this way back in the indistinct blur
oI my childhood. (\hen I was z, my Iather accidentally deleted all
my computer Iiles. I have no memory oI anything beIore that.)
But that's how I remember Iirst learning about the diIIerence
between intensional and extensional deIinition. To give an inten
sional deIinition" is to deIine a word or phrase in terms oI other
words, as a dictionary does. To give an extensional deIinition" is
to point to examples, as adults do when teaching children. The
preceding sentence gives an intensional deIinition oI extensional
deIinition", which makes it an extensional example oI intensional
deIinition".
. http://lesswrong.com/lw/nh/extensions_and_intensions/
z. Page j;, '\ords as Hidden InIerences'.
. http://www.ecs.soton.ac.uk/%7Eharnad/Papers/Harnad/harnad90.sgproblem.html
. Page 6, 'Positive Bias: Look Into the Lark'.
In Hollywood Rationality and popular culture generally, ratio
nalists" are depicted as wordobsessed, Iloating in endless verbal
space disconnected Irom reality.
But the actual Traditional Rationalists have long insisted on
maintaining a tight connection to experience:
II you look into a textbook oI chemistry Ior a deIinition
oI lithium, you may be told that it is that element whose
atomic weight is ; very nearly. But iI the author has a
more logical mind he will tell you that iI you search
among minerals that are vitreous, translucent, grey or
white, very hard, brittle, and insoluble, Ior one which
imparts a crimson tinge to an unluminous Ilame, this
mineral being triturated with lime or witherite ratsbane,
and then Iused, can be partly dissolved in muriatic acid,
and iI this solution be evaporated, and the residue be
extracted with sulphuric acid, and duly puriIied, it can be
converted by ordinary methods into a chloride, which
being obtained in the solid state, Iused, and electrolyzed
with halI a dozen powerIul cells, will yield a globule oI a
pinkish silvery metal that will Iloat on gasolene, and the
material oI that is a specimen oI lithium."
Charles Sanders Peirce
That's an example oI logical mind" as described by a genuine
Traditional Rationalist, rather than a Hollywood scriptwriter.
But note: Peirce isn't actually showing you a piece oI lithium.
He didn't have pieces oI lithium stapled to his book. Rather he's
giving you a treasure mapan intensionally deIined procedure
which, when executed, will lead you to an extensional example oI
lithium. This is not the same as just tossing you a hunk oI lithium,
but it's not the same as saying atomic weight ;" either. (Though iI
you had sufficiently sharp eyes, saying protons" might let you pick
out lithium at a glance.)
So that is intensional and extensional definition., which is a way
oI telling someone else what you mean by a concept. \hen I
talked about deIinitions" above, I talked about a way oI communi-
cating conceptstelling someone else what you mean by red", tiger",
EXTENSIONS AND INTENSIONS 201
human", or lithium". ow let's talk about the actual concepts
themselves.
The actual intension oI my tiger" concept would be the neural
pattern (in my temporal cortex) that inspects an incoming signal
Irom the visual cortex to determine whether or not it is a tiger.
The actual extension oI my tiger" concept is everything I call a
tiger.
Intensional deIinitions don't capture entire intensions, exten
sional deIinitions don't capture entire extensions. II I point to just
one tiger and say the word tiger", the communication may Iail iI
they think I mean dangerous animal" or male tiger" or yellow
thing". Similarly, iI I say dangerous yellowblack striped animal",
without pointing to anything, the listener may visualize giant hor
nets.
You can't capture in words all the details oI the cognitive con
ceptas it exists in your mindthat lets you recognize things as
tigers or nontigers. It's too large. And you can't point to all the
tigers you've ever seen, let alone everything you would call a tiger.
The strongest deIinitions use a crossIire oI intensional and ex
tensional communication to nail down a concept. Lven so, you only
communicate maps to concepts, or instructions Ior building con
ceptsyou don't communicate the actual categories as they exist in
your mind or in the world.
(Yes, with enough creativity you can construct exceptions to
this rule, like Sentences Lliezer Yudkowsky has published contain
ing the term 'huragaloni' as oI Feb , zoo8". I've just shown you this
concept's entire extension. But except in mathematics, deIinitions
are usually treasure maps, not treasure.)
So that's another reason you can't deIine a word any way you
like": You can't directly program concepts into someone else's
brain.
Lven within the Aristotelian paradigm, where we pretend that
the deIinitions are the actual concepts, you don't have simultaneous
Ireedom oI intension and extension. Suppose I deIine Mars as A
huge red rocky sphere, around a tenth oI Larth's mass and ,o%
Iurther away Irom the Sun". It's then a separate matter to show
that this intensional deIinition matches some particular extension
al thing in my experience, or indeed, that it matches any real thing
whatsoever. II instead I say That's Mars" and point to a red light
in the night sky, it becomes a separate matter to show that this ex
tensional light matches any particular intensional deIinition I may
proposeor any intensional belieIs I may havesuch as Mars is
the Cod oI \ar".
But most oI the brain's work oI applying intensions happens
subdeliberately. \e aren't consciously aware that our identiIica
tion oI a red light as Mars" is a separate matter Irom our verbal
deIinition Mars is the Cod oI \ar". o matter what kind oI in
tensional deIinition I make up to describe Mars, my mind believes
that Mars" reIers to this thingy
,
, and that it is the Iourth planet in
the Solar System.
\hen you take into account the way the human mind actually,
pragmatically works, the notion I can deIine a word any way I like"
soon becomes I can believe anything I want about a Iixed set oI
objects" or I can move any object I want in or out oI a Iixed mem
bership test". ]ust as you can't usually convey a concept's whole
intension in words because it's a big complicated neural member
ship test, you can't control the concept's entire intension because it's
applied subdeliberately. This is why arguing that XYZ is true by
deIinition" is so popular. II deIinition changes behaved like the em
pirical nullops they're supposed to be, no one would bother arguing
them. But abuse deIinitions just a little, and they turn into magic
wandsin arguments, oI course, not in reality.
,. http://en.wikipedia.org/wiki/Image:Mars_Hubble.jpg
EXTENSIONS AND INTENSIONS 203
5. Similarity Clusters
Followup to: Lxtensions and Intensions

z
Once upon a time, the philosophers oI Plato's Academy claimed
that the best deIinition oI human was a Ieatherless biped". Lio
genes oI Sinope, also called Liogenes the Cynic, is said to have
promptly exhibited a plucked chicken and declared Here is Plato's
man." The Platonists promptly changed their deIinition to a
Ieatherless biped with broad nails".
o dictionary, no encyclopedia, has ever listed all the things
that humans have in common. \e have red blood, Iive Iingers on
each oI two hands, bony skulls, z pairs oI chromosomesbut the
same might be said oI other animal species. \e make complex
tools to make complex tools, we use syntactical combinatorial lan
guage, we harness critical Iission reactions as a source oI energy:
these things may serve out to single out only humans, but not all hu
mansmany oI us have never built a Iission reactor. \ith the right
set oI necessaryandsuIIicient gene sequences you could single out
all humans, and only humansat least Ior nowbut it would still
be Iar Irom all that humans have in common.
But so long as you don't happen to be near a plucked chicken, saying
Look Ior Ieatherless bipeds" may serve to pick out a Iew dozen oI
the particular things that are humans, as opposed to houses, vases,
sandwiches, cats, colors, or mathematical theorems.
Once the deIinition Ieatherless biped" has been bound to some
particular Ieatherless bipeds, you can look over the group, and begin
harvesting some oI the other characteristicsbeyond mere Ieather
Iree twolegginessthat the Ieatherless bipeds" seem to share in
common. The particular Ieatherless bipeds that you see seem to al
so use language, build complex tools, speak combinatorial language
with syntax, bleed red blood iI poked, die when they drink hemlock.
Thus the category human" grows richer, and adds more and
more characteristics, and when Liogenes Iinally presents his
plucked chicken, we are not Iooled: This plucked chicken is obvi
ously not similar to the other Ieatherless bipeds".
. http://lesswrong.com/lw/nj/similarity_clusters/
z. Page zoo, 'Lxtensions and Intensions'.
(II Aristotelian logic were a good model oI human psychology,
the Platonists would have looked at the plucked chicken and said,
Yes, that's a human, what's your point:")
II the first Ieatherless biped you see is a plucked chicken, then
you may end up thinking that the verbal label human" denotes
a plucked chicken, so I can modiIy my treasure map to point to
Ieatherless bipeds with broad nails", and iI I am wise, go on to say,
See Liogenes over there: That's a human, and I'm a human, and
you're a human, and that chimpanzee is not a human, though Iairly
close."
The initial clue only has to lead the user to the similarity clus
terthe group oI things that have many characteristics in com
mon. AIter that, the initial clue has served its purpose, and I can go
on to convey the new inIormation humans are currently mortal",
or whatever else I want to say about us Ieatherless bipeds.
A dictionary is best thought oI, not as a book oI Aristotelian
class deIinitions, but a book oI hints Ior matching verbal labels to
similarity clusters, or matching labels to properties that are useIul
in distinguishing similarity clusters.
SIMILARITY CLUSTERS 205
6. Typicality and Asymmetrical Similarity
Followup to: Similarity Clusters

z
Birds Ily. \ell, except ostriches don't. But which is a more typ
ical birda robin, or an ostrich:
\hich is a more typical chair: A desk chair, a rocking chair, or a
beanbag chair:
Most people would say that a robin is a more typical bird, and a
desk chair is a more typical chair. The cognitive psychologists who
study this sort oI thing experimentally, do so under the heading oI
typicality eIIects" or prototype eIIects" (Rosch and Lloyd j;8).
For example, iI you ask subjects to press a button to indicate true"
or Ialse" in response to statements like A robin is a bird" or A
penguin is a bird", reaction times are Iaster Ior more central ex
amples. (I'm still unpacking my books, but I'm reasonably sure my
source on this is LakoII j86.) Typicality measures correlate well
using diIIerent investigative methodsreaction times are one ex
ample, you can also ask people to directly rate, on a scale oI to
o, how well an example (like a speciIic robin) Iits a category (like
bird").
So we have a mental measure oI typicalitywhich might, per
haps, Iunction as a heuristicbut is there a corresponding bias we
can use to pin it down:
\ell, which oI these statements strikes you as more natural:
j8 is approximately oo", or oo is approximately j8": II you're
like most people, the Iirst statement seems to make more sense.
(Sadock j;;.) For similar reasons, people asked to rate how similar
Mexico is to the \nited States, gave consistently higher ratings
than people asked to rate how similar the \nited States is to Mexi
co. (Tversky and Cati j;8.)
And iI that still seems harmless, a study by Rips (j;,) showed
that people were more likely to expect a disease would spread Irom
robins to ducks on an island, than Irom ducks to robins. ow this
is not a logical impossibility, but in a pragmatic sense, whatever diI
Ierence separates a duck Irom a robin and would make a disease less
likely to spread Irom a duck to a robin, must also be a diIIerence
. http://lesswrong.com/lw/nk/typicality_and_asymmetrical_similarity/
z. Page zo, 'Similarity Clusters'.
between a robin and a duck, and would make a disease less likely to
spread Irom a robin to a duck.
Yes, you can come up with rationalizations, like \ell, there
could be more neighboring species oI the robins, which would make
the disease more likely to spread initially, etc.," but be careIul not
to try too hard to rationalize the probability ratings oI subjects who
didn't even realize there was a comparison going on. And don't
Iorget that Mexico is more similar to the \nited States than the
\nited States is to Mexico, and that j8 is closer to oo than oo is
to j8. A simpler interpretation is that people are using the (demon
strated) similarity heuristic as a proxy Ior the probability that a
disease spreads, and this heuristic is (demonstrably) asymmetrical.
Kansas is unusually close to the center oI the \nited States,
and Alaska is unusually Iar Irom the center oI the \nited States,
so Kansas is probably closer to most places in the \S and Alaska
is probably Iarther. It does not Iollow, however, that Kansas is
closer to Alaska than is Alaska to Kansas. But people seem to rea
son (metaphorically speaking) as iI closeness is an inherent property
oI Kansas and distance is an inherent property oI Alaska, so that
Kansas is still close, even to Alaska, and Alaska is still distant, even
Irom Kansas.
So once again we see that Aristotle's notion oI cate
gorieslogical classes with membership determined by a collection
oI properties that are individually strictly necessary, and together
strictly suIIicientis not a good model oI human cognitive psy
chology. (Science's view has changed somewhat over the last z,o
years: \ho would've thought:) \e don't even reason as iI set
membership is a trueorIalse property: Statements oI set member
ship can be more or less true. (ote: This is not the same thing as
being more or less probable.)
One more reason not to pretend
that you, or anyone else, is re-

ally going to treat words as Aristotelian logical classes.
LakoII, Ceorge. (j86). Women, Fire and Dangerous Things: What
Categories Tell Us About the Nature of Thought. \niversity oI Chicago
Press, Chicago.
. Page j;, '\ords as Hidden InIerences'.
TYPICALITY AND ASYMMETRICAL SIMILARITY 207
Rips, Lance ]. (j;,). Inductive judgments about natural cate
gories." Journal of Verbal Learning and Verbal Behavior. 14:66,8.
Rosch, Lleanor and B. B. Lloyd, eds. (j;8). Cognition and Cate-
gorization. Hillsdale, .].: Lawrence Lrlbaum Associates.
Sadock, ]errold. (j;;). Truth and Approximations." In Papers
from the Third Annual Meeting of the Berkeley Linguistics Society, pp.
oj. Berkeley: Berkeley Linguistics Society.
Tversky, Amos and Itamar Cati. (j;8). Studies oI Similarity".
In Rosch and Lloyd (j;8).
7. The Cluster Structure of Thingspace
Followup to: Typicality and Asymmetrical Similarity

z
The notion oI a conIiguration space" is a way oI translating ob
ject descriptions into object positions. It may seem like blue is closer"
to bluegreen than to red, but how much closer: It's hard to an
swer that question by just staring at the colors. But it helps to know
that the (proportional) color coordinates in RCB are o:o:,, o::z
and ,:o:o. It would be even clearer iI plotted on a L graph.
In the same way, you can see a robin as a robinbrown tail, red
breast, standard robin shape, maximum Ilying speed when unladen,
its speciestypical LA and individual alleles. Or you could see a
robin as a single point in a conIiguration space whose dimensions
described everything we knew, or could know, about the robin.
A robin is bigger than a virus, and smaller than an aircraIt
carrierthat might be the volume" dimension. Likewise a robin
weighs more than a hydrogen atom, and less than a galaxy, that
might be the mass" dimension. LiIIerent robins will have strong
correlations between volume" and mass", so the robinpoints will
be lined up in a Iairly linear string, in those two dimensionsbut
the correlation won't be exact, so we do need two separate dimen
sions.
This is the beneIit oI viewing robins as points in space: You
couldn't see the linear lineup as easily iI you were just imagining the
robins as cute little wingIlapping creatures.
A robin's LA is a highly multidimensional variable, but you
can still think oI it as part oI a robin's location in
thingspacemillions oI quaternary coordinates, one coordinate Ior
each LA baseor maybe a more sophisticated view that . The
shape oI the robin, and its color (surIace reIlectance), you can like
wise think oI as part oI the robin's position in thingspace, even
though they aren't single dimensions.
]ust like the coordinate point o:o:, contains the same inIor
mation as the actual HTML color blue, we shouldn't actually lose
inIormation when we see robins as points in space. \e believe the
same statement about the robin's mass whether we visualize a robin
. http://lesswrong.com/lw/nl/the_cluster_structure_of_thingspace/
z. Page zo6, 'Typicality and Asymmetrical Similarity'.
balancing the scales opposite a o.o;kilogram weight, or a robin
point with a masscoordinate oI -;o.
\e can even imagine a conIiguration space with one or more
dimensions Ior every distinct characteristic oI an object, so that the
position oI an object's point in this space corresponds to all the in
Iormation in the real object itselI. Rather redundantly represented,
toodimensions would include the mass, the volume, and the den
sity.
II you think that's extravagant, quantum physicists use an
infinite-dimensional conIiguration space, and a single point in that
space describes the location oI every particle in the universe. So
we're actually being comparatively conservative in our visualization
oI thingspacea point in thingspace describes just one object, not
the entire universe.
II we're not sure oI the robin's exact mass and volume, then we
can think oI a little cloud in thingspace, a volume of uncertainty, with
in which the robin might be. The density oI the cloud is the density
oI our belieI that the robin has that particular mass and volume. II
you're more sure oI the robin's density than oI its mass and volume,
your probabilitycloud will be highly concentrated in the density di
mension, and concentrated around a slanting line in the subspace oI
mass/volume. (Indeed, the cloud here is actually a surIace, because
oI the relation VL M.)
Radial categories" are how cognitive psychologists describe the
nonAristotelian boundaries oI words. The central mother" con
ceives her child, gives birth to it, and supports it. Is an egg donor
who never sees her child a mother: She is the genetic mother".
\hat about a woman who is implanted with a Ioreign embryo and
bears it to term: She is a surrogate mother". And the woman who
raises a child that isn't hers genetically: \hy, she's an adoptive
mother". The Aristotelian syllogism would run, Humans have ten
Iingers, Fred has nine Iingers, thereIore Fred is not a human" but
the way we actually think is Humans have ten Iingers, Fred is a hu
man, thereIore Fred is a 'nineIingered human'."
\e can think about the radialness oI categories in intensional
terms, as described aboveproperties that are usually present, but
optionally absent. II we thought about the intension oI the word
mother", it might be like a distributed glow in thingspace, a glow
whose intensity matches the degree to which that volume oI
thingspace matches the category mother". The glow is concentrat
ed in the center oI genetics and birth and childraising, the volume
oI egg donors would also glow, but less brightly.
Or we can think about the radialness oI categories extensional
ly. Suppose we mapped all the birds in the world into thingspace,
using a distance metric that corresponds as well as possible to per
ceived similarity in humans: A robin is more similar to another
robin, than either is similar to a pigeon, but robins and pigeons are
all more similar to each other than either is to a penguin, etcetera.
Then the center oI all birdness would be densely populated by
many neighboring tight clusters, robins and sparrows and canaries
and pigeons and many other species. Lagles and Ialcons and oth
er large predatory birds would occupy a nearby cluster. Penguins
would be in a more distant cluster, and likewise chickens and os
triches.
The result might look, indeed, something like an astronomical
cluster: many galaxies orbiting the center, and a Iew outliers.
Or we could think simultaneously about both the intension oI
the cognitive category bird", and its extension in realworld birds:
The central clusters oI robins and sparrows glowing brightly with
highly typical birdness, satellite clusters oI ostriches and penguins
glowing more dimly with atypical birdness, and Abraham Lincoln a
Iew megaparsecs away and glowing not at all.
I preIer that last visualizationthe glowing pointsbecause as
I see it, the structure oI the cognitive intension Iollowed Irom
the extensional cluster structure. First came the structureinthe
world, the empirical distribution oI birds over thingspace, then, by
observing it, we Iormed a category whose intensional glow roughly
overlays this structure.
This gives us yet another view oI why words are not Aristotelian
classes: the empirical clustered structure oI the real universe is not
so crystalline. A natural cluster, a group oI things highly similar
to each other, may have no set oI necessary and suIIicient proper
tiesno set oI characteristics that all group members have, and no
nonmembers have.
But even iI a category is irrecoverably blurry and bumpy, there's
no need to panic. I would not object iI someone said that birds
THE CLUSTER STRUCTURE OF THINGSPACE 211
are Ieathered Ilying things". But penguins dont fly!well, Iine. The
usual rule has an exception, it's not the end oI the world. LeIini
tions can't be expected to exactly match the empirical structure oI
thingspace in any event, because the map is smaller and much less
complicated than the territory. The point oI the deIinition Ieath
ered Ilying things" is to lead the listener to the bird cluster, not to
give a total description oI every existing bird down to the molecular
level.
\hen you draw a boundary around a group oI extensional
points empirically clustered in thingspace, you may Iind at least one
exception to every simple intensional rule you can invent.
But iI a deIinition works well enough in practice to point out
the intended empirical cluster, objecting to it may justly be called
nitpicking".
8. Disguised Queries
Followup to: The Cluster Structure oI Thingspace

z
Imagine that you have a peculiar job in a peculiar Iactory: Your
task is to take objects Irom a mysterious conveyor belt, and sort
the objects into two bins. \hen you Iirst arrive, Susan the Senior
Sorter explains to you that blue eggshaped objects are called bleg
gs" and go in the blegg bin", while red cubes are called rubes" and
go in the rube bin".
Once you start working, you notice that bleggs and rubes diIIer
in ways besides color and shape. Bleggs have Iur on their surIace,
while rubes are smooth. Bleggs Ilex slightly to the touch, rubes are
hard. Bleggs are opaque, the rube's surIace slightly translucent.
Soon aIter you begin working, you encounter a blegg shaded
an unusually dark bluein Iact, on closer examination, the color
proves to be purple, halIway between red and blue.
Yet wait! \hy are you calling this object a blegg": A blegg"
was originally deIined as blue and eggshapedthe qualiIication oI
blueness appears in the very name blegg", in Iact. This object is
not blue. One oI the necessary qualiIications is missing, you should
call this a purple eggshaped object", not a blegg".
But it so happens that, in addition to being purple and egg
shaped, the object is also Iurred, Ilexible, and opaque. So when you
saw the object, you thought, Oh, a strangely colored blegg." It cer
tainly isn't a rube. right:
Still, you aren't quite sure what to do next. So you call over Su
san the Senior Sorter.
Oh, yes, it's a blegg," Susan says, you can put it in the
blegg bin."
You start to toss the purple blegg into the blegg bin,
but pause Ior a moment. Susan," you say, how do you
know this is a blegg:"
Susan looks at you oddly. Isn't it obvious: This
object may be purple, but it's still eggshaped, Iurred,
Ilexible, and opaque, like all the other bleggs. You've got
. http://lesswrong.com/lw/nm/disguised_queries/
z. Page zoj, 'The Cluster Structure oI Thingspace'.
to expect a Iew color deIects. Or is this one oI those
philosophical conundrums, like 'How do you know the
world wasn't created Iive minutes ago complete with
Ialse memories:' In a philosophical sense I'm not
absolutely certain that this is a blegg, but it seems like a
good guess."
o, I mean." You pause, searching Ior words. Why
is there a blegg bin and a rube bin: \hat's the difference
between bleggs and rubes:"
Bleggs are blue and eggshaped, rubes are red and
cubeshaped," Susan says patiently. You got the
standard orientation lecture, right:"
\hy do bleggs and rubes need to be sorted:"
Lr. because otherwise they'd be all mixed up:" says
Susan. Because nobody will pay us to sit around all day
and not sort bleggs and rubes:"
\ho originally determined that the Iirst blue egg
shaped object was a 'blegg', and how did they determine
that:"
Susan shrugs. I suppose you could just as easily call
the red cubeshaped objects 'bleggs' and the blue egg
shaped objects 'rubes', but it seems easier to remember
this way."
You think Ior a moment. Suppose a completely
mixedup object came oII the conveyor. Like, an orange
sphereshaped Iurred translucent object with writhing
green tentacles. How could I tell whether it was a blegg
or a rube:"
\ow, no one's ever Iound an object that mixed up,"
says Susan, but I guess we'd take it to the sorting
scanner."
How does the sorting scanner work:" you inquire.
Xrays: Magnetic resonance imaging: Fast neutron
transmission spectroscopy:"
I'm told it works by Bayes's Rule, but I don't quite
understand how," says Susan. I like to say it, though.
Bayes Bayes Bayes Bayes Bayes."
\hat does the sorting scanner tell you:"
It tells you whether to put the object into the blegg
bin or the rube bin. That's why it's called a sorting
scanner."
At this point you Iall silent.
Incidentally," Susan says casually, it may interest you
to know that bleggs contain small nuggets oI vanadium
ore, and rubes contain shreds oI palladium, both oI
which are useIul industrially."
Susan, you are pure evil."
Thank you."
So now it seems we've discovered the heart and essence oI bleg
gness: a blegg is an object that contains a nugget oI vanadium ore.
SurIace characteristics, like blue color and Iurredness, do not de-
termine whether an object is a blegg, surIace characteristics only
matter because they help you infer whether an object is a blegg, that
is, whether the object contains vanadium.
Containing vanadium is a necessary and suIIicient deIinition: all
bleggs contain vanadium and everything that contains vanadium is a
blegg: blegg" is just a shorthand way oI saying vanadiumcontain
ing object." Right:
ot so Iast, says Susan: Around j8% oI bleggs contain vana
dium, but z% contain palladium instead. To be precise (Susan
continues) around j8% oI blue eggshaped Iurred Ilexible opaque
objects contain vanadium. For unusual bleggs, it may be a diIIerent
percentage: j,% oI purple bleggs contain vanadium, jz% oI hard
bleggs contain vanadium, etc.
ow suppose you Iind a blue eggshaped Iurred Ilexible opaque
object, an ordinary blegg in every visible way, and just Ior kicks
you take it to the sorting scanner, and the scanner says palladi
um"this is one oI the rare z%. Is it a blegg:
At Iirst you might answer that, since you intend to throw this
object in the rube bin, you might as well call it a rube". However,
it turns out that almost all bleggs, iI you switch oII the lights, glow
Iaintly in the dark, while almost all rubes do not glow in the dark.
And the percentage oI bleggs that glow in the dark is not signiIi
cantly diIIerent Ior blue eggshaped Iurred Ilexible opaque objects
that contain palladium, instead oI vanadium. Thus, iI you want to
guess whether the object glows like a blegg, or remains dark like a
rube, you should guess that it glows like a blegg.
DISGUISED QUERIES 215
So is the object really a blegg or a rube:
On one hand, you'll throw the object in the rube bin no matter
what else you learn. On the other hand, iI there are any unknown
characteristics oI the object you need to inIer, you'll inIer them as
iI the object were a blegg, not a rubegroup it into the similarity
cluster oI blue eggshaped Iurred Ilexible opaque things, and not
the similarity cluster oI red cubeshaped smooth hard translucent
things.
The question Is this object a blegg:" may stand in Ior diIIerent
queries on diIIerent occasions.
II it weren't standing in Ior some query, you'd have no reason to
care.
Is atheism a religion"
: Is transhumanism a cult"
: People
who argue that atheism is a religion because it states belieIs about
Cod" are really trying to argue (I think) that the reasoning methods
used in atheism are on a par with the reasoning methods used in
religion, or that atheism is no saIer than religion in terms oI the
probability oI causally engendering violence, etc. \hat's really at
stake is an atheist's claim oI substantial diIIerence and superiority
relative to religion, which the religious person is trying to reject by
denying the diIIerence rather than the superiority(!)
But that's not the a priori irrational part: The a priori irrational
part is where, in the course oI the argument, someone pulls out a
dictionary and looks up the deIinition oI atheism" or religion".
(And yes, it's just as silly whether an atheist or religionist does it.)
How could a dictionary possibly decide whether an empirical cluster
oI atheists is really substantially diIIerent Irom an empirical clus
ter oI theologians: How can reality vary with the meaning oI a
word: The points in thingspace don't move around when we re
draw a boundary.
But people oIten don't realize that their argument about where
to draw a deIinitional boundary, is really a dispute over whether to
inIer a characteristic shared by most things inside an empirical clus
ter.
Hence the phrase, disguised query".
. Page jz, '\ncritical Supercriticality'.
. Page , 'Cultish Countercultishness'.
9. Neural Categories
Followup to: Lisguised Queries

z
In Lisguised Queries
, I talked about a classiIication task oI

bleggs" and rubes". The typical blegg is blue, eggshaped, Iurred,
Ilexible, opaque, glows in the dark, and contains vanadium. The
typical rube is red, cubeshaped, smooth, hard, translucent, unglow
ing, and contains palladium. For the sake oI simplicity, let us Iorget
the characteristics oI Ilexibility/hardness and opaqueness/translu
cency. This leaves Iive dimensions in thingspace
: Color, shape,
texture, luminance, and interior.
Suppose I want to create an ArtiIicial eural etwork (A)
to predict unobserved blegg characteristics Irom observed blegg
characteristics. And suppose I'm Iairly naive about As: I've
read excited popular science books about how neural networks are
distributed, emergent, and parallel just like the human brain!! but I
can't derive the diIIerential equations Ior gradient descent in a non
recurrent multilayer network with sigmoid units (which is actually a
lot easier than it sounds).
Then I might design a neural network that looks something like
this:
. http://lesswrong.com/lw/nn/neural_categories/
z. Page z, 'Lisguised Queries'.
. Page z, 'Lisguised Queries'.
. Page zoj, 'The Cluster Structure oI Thingspace'.
Blegg
etwork is Ior classiIying bleggs and rubes. But since blegg"
is an unIamiliar and synthetic concept, I've also included a similar
etwork b Ior distinguishing humans Irom Space Monsters, with
input Irom Aristotle (All men are mortal") and Plato's Academy
(A Ieatherless biped with broad nails").
A neural network needs a learning rule. The obvious idea is
that when two nodes are oIten active at the same time, we should
strengthen the connection between themthis is one oI the Iirst
rules ever proposed Ior training a neural network, known as Hebb's
Rule.
Thus, iI you oIten saw things that were both blue and
Iurredthus simultaneously activating the color" node in the -
state and the texture" node in the - statethe connection would
strengthen between color and texture, so that - colors activated -
textures, and vice versa. II you saw things that were blue and egg
shaped and vanadiumcontaining, that would strengthen positive
mutual connections between color and shape and interior.
Let's say you've already seen plenty oI bleggs and rubes come oII
the conveyor belt. But now you see something that's Iurred, egg
shaped, andgasp!reddish purple (which we'll model as a color"
activation level oI z/). You haven't yet tested the luminance, or
the interior. \hat to predict, what to predict:
\hat happens then is that the activation levels in etwork
bounce around a bit. Positive activation Ilows luminance Irom
shape, negative activation Ilows to interior Irom color, negative
activation Ilows Irom interior to luminance. OI course all these
messages are passed in parallel!! and asynchronously!! just like the hu
man brain.
Finally etwork settles into a stable state, which has high pos
itive activation Ior luminance" and interior". The network may
be said to expect" (though it has not yet seen) that the object will
glow in the dark, and that it contains vanadium.
And lo, etwork exhibits this behavior even though there's no
explicit node that says whether the object is a blegg or not. The
judgment is implicit in the whole network!! Bleggness is an attractor!!
which arises as the result oI emergent behavior!! Irom the distributed!!
learning rule.
ow in real liIe, this kind oI network designhowever Iaddish
,
it may soundruns into all sorts oI problems. Recurrent networks
don't always settle right away: They can oscillate, or exhibit chaotic
behavior, or just take a very long time to settle down. This is a Bad
Thing when you see something big and yellow and striped, and you
have to wait Iive minutes Ior your distributed neural network to set
tle into the tiger" attractor. Asynchronous and parallel it may be,
but it's not realtime.
And there are other problems, like doublecounting the evi
dence
6
when messages bounce back and Iorth: II you suspect that
an object glows in the dark, your suspicion will activate belieI that
the object contains vanadium, which in turn will activate belieI that
the object glows in the dark.
Plus iI you try to scale up the etwork design, it requires
O(
z
) connections, where is the total number oI observables.
So what might be a more realistic neural network design:
,. http://thedailywtf.com/Articles/No,_We_Need_a_Neural_Network.aspx
NEURAL CATEGORIES 219
Bleggz
In this network, a wave oI activation converges on the central node
Irom any clamped (observed) nodes, and then surges back out again
to any unclamped (unobserved) nodes. \hich means we can com
pute the answer in one step, rather than waiting Ior the network to
settlean important requirement in biology when the neurons only
run at zoHz. And the network architecture scales as O(), rather
than O(
z
).
Admittedly, there are some things you can notice more easily
with the Iirst network architecture than the second. etwork
has a direct connection between every two nodes. So iI red objects
never glow in the dark, but red Iurred objects usually have the
other blegg characteristics like eggshape and vanadium, etwork
can easily represent this: it just takes a very strong direct negative
connection Irom color to luminance, but more powerIul positive
connections Irom texture to all other nodes except luminance.
or is this a special exception" to the general rule that bleggs
glowremember, in etwork , there is no unit that represents
bleggness, bleggness emerges as an attractor in the distributed
network.
So yes, those
z
connections were buying us something. But
not very much. etwork is not more useIul on most realworld
problems, where you rarely Iind an animal stuck halIway between
being a cat and a dog.
(There are also Iacts that you can't easily represent in etwork
or etwork z. Let's say seablue color and spheroid shape, when
Iound together, always indicate the presence oI palladium, but
when Iound individually, without the other, they are each very
strong evidence Ior vanadium. This is hard to represent, in either
architecture, without extra nodes. Both etwork and etwork
z embody implicit assumptions about what kind oI environmental
structure is likely to exist, the ability to read this oII is what sepa
rates the adults Irom the babes, in machine learning.)
Make no mistake: either etwork , nor etwork z, are bi
ologically realistic. But it still seems like a Iair guess that however
the brain really works, it is in some sense closer to etwork z than
etwork . Fast, cheap, scalable, works well to distinguish dogs and
cats: natural selection goes Ior that sort oI thing like water running
down a Iitness landscape.
It seems like an ordinary enough task to classiIy objects as either
bleggs or rubes, tossing them into the appropriate bin. But would
you notice iI seablue objects never glowed in the dark:
Maybe, iI someone presented you with twenty objects that were
alike only in being seablue, and then switched oII the light, and
none oI the objects glowed. II you got hit over the head with it,
in other words. Perhaps by presenting you with all these seablue
objects in a group, your brain Iorms a new subcategory, and can de
tect the doesn't glow" characteristic within that subcategory. But
you probably wouldn't notice iI the seablue objects were scattered
among a hundred other bleggs and rubes. It wouldn't be easy or in-
tuitive to notice, the way that distinguishing cats and dogs is easy
and intuitive.
Or: Socrates is human, all humans are mortal, thereIore
Socrates is mortal." How did Aristotle know that Socrates was hu
man: \ell, Socrates had no Ieathers, and broad nails, and walked
upright, and spoke Creek, and, well, was generally shaped like a hu
man and acted like one. So the brain decides, once and Ior all, that
Socrates is human, and Irom there, inIers that Socrates is mortal
like all other humans thus yet observed. It doesn't seem easy or
intuitive to ask how much wearing clothes, as opposed to using lan
NEURAL CATEGORIES 221
guage, is associated with mortality. ]ust, things that wear clothes
and use language are human" and humans are mortal".
Are there biases associated with trying to classiIy things into
categories once and Ior all: OI course there are. See e.g. Cultish
Countercultishness
;
.
To be continued.
;. Page , 'Cultish Countercultishness'.
10. How An Algorithm Feels From Inside
Followup to: eural Categories

z
II a tree Ialls in the Iorest, and no one hears it, does it make a
sound:" I remember seeing an actual argument get started on this
subjecta Iully naive argument that went nowhere near Berkeleyan
subjectivism. ]ust:
It makes a sound, just like any other Ialling tree!"
But how can there be a sound that no one hears:"
The standard rationalist view would be that the Iirst person is
speaking as iI sound" means acoustic vibrations in the air, the sec
ond person is speaking as iI sound" means an auditory experience
in a brain. II you ask Are there acoustic vibrations:" or Are there
auditory experiences:", the answer is at once obvious. And so the
argument is really about the deIinition oI the word sound".
I think the standard analysis is essentially correct. So let's ac
cept that as a premise, and ask: \hy do people get into such an
argument: \hat's the underlying psychology:
A key idea oI the heuristics and biases program is that mistakes
are oIten more revealing oI cognition than correct answers. Cet
ting into a heated dispute about whether, iI a tree Ialls in a deserted
Iorest, it makes a sound, is traditionally considered a mistake.
So what kind oI mind design corresponds to that error:
In Lisguised Queries
I introduced the blegg/rube classiIication

task, in which Susan the Senior Sorter explains that your job is to
sort objects coming oII a conveyor belt, putting the blue eggs or
bleggs" into one bin, and the red cubes or rubes" into the rube
bin. This, it turns out, is because bleggs contain small nuggets oI
vanadium ore, and rubes contain small shreds oI palladium, both oI
which are useIul industrially.
Lxcept that around z% oI blue eggshaped objects contain pal
ladium instead. So iI you Iind a blue eggshaped thing that contains
. http://lesswrong.com/lw/no/how_an_algorithm_feels_from_inside/
z. Page z;, 'eural Categories'.
palladium, should you call it a rube" instead: You're going to put
it in the rube binwhy not call it a rube":
But when you switch oII the light, nearly all bleggs glow Iaintly
in the dark. And blue eggshaped objects that contain palladium
are just as likely to glow in the dark as any other blue eggshaped
object.
So iI you Iind a blue eggshaped object that contains palladium,
and you ask Is it a blegg:", the answer depends on what you have to
do with the answer: II you ask \hich bin does the object go in:",
then you choose as iI the object is a rube. But iI you ask II I turn
oII the light, will it glow:", you predict as iI the object is a blegg.
In one case, the question Is it a blegg:" stands in Ior the disguised
query
, \hich bin does it go in:". In the other case, the question

Is it a blegg:" stands in Ior the disguised query
,
, \ill it glow in
the dark:"
ow suppose that you have an object that is blue and egg
shaped and contains palladium, and you have already observed that
it is Iurred, Ilexible, opaque, and glows in the dark.
This answers every query, observes every observable introduced.
There's nothing leIt Ior a disguised query to stand for.
So why might someone Ieel an impulse to go on arguing whether
the object is really a blegg:
,. Page z, 'Lisguised Queries'.
Blegg
This diagram Irom eural Categories
6
shows two diIIerent neu
ral networks that might be used to answer questions about bleggs
and rubes. etwork has a number oI disadvantagessuch as
potentially oscillating/chaotic behavior, or requiring O(
z
) con
nectionsbut etwork 's structure does have one major advan
tage over etwork z: Lvery unit in the network corresponds to a
testable query. II you observe every observable, clamping every val
ue, there are no units in the network leIt over.
etwork z, however, is a Iar better candidate Ior being some
thing vaguely like how the human brain works: It's Iast, cheap,
scalableand has an extra dangling unit in the center, whose acti
vation can still vary, even aIter we've observed every single one oI
the surrounding nodes.
\hich is to say that even aIter you know whether an object
is blue or red, egg or cube, Iurred or smooth, bright or dark, and
whether it contains vanadium or palladium, it feels like there's a leIt
over, unanswered question: But is it really a blegg?
\sually, in our daily experience, acoustic vibrations and auditory
experience go together. But a tree Ialling in a deserted Iorest un
bundles this common association. And even aIter you know that
6. Page z;, 'eural Categories'.
HOW AN ALGORITHM FEELS FROM INSIDE 225
the Ialling tree creates acoustic vibrations but not auditory experi
ence, it feels like there's a leItover question: Did it make a sound?
\e know where Pluto is, and where it's going, we know Pluto's
shape, and Pluto's massbut is it a planet:
ow remember: \hen you look at etwork z, as I've laid
it out here, you're seeing the algorithm Irom the outside. People
don't think to themselves, Should the central unit Iire, or not:" any
more than you think Should neuron =z,z,zo,zz in my visual
cortex Iire, or not:"
It takes a deliberate eIIort to visualize your brain Irom the out
sideand then you still don't see your actual brain, you imagine
what you think is there, hopeIully based on science, but regardless,
you don't have any direct access to neural network structures Irom
introspection. That's why the ancient Creeks didn't invent compu
tational neuroscience.
\hen you look at etwork z, you are seeing Irom the outside;
but the way that neural network structure Ieels Irom the inside, iI
you yourselI are a brain running that algorithm, is that even aIter
you know every characteristic oI the object, you still Iind yourselI
wondering: But is it a blegg, or not:"
This is a great gap to cross, and I've seen it stop people in their
tracks. Because we don't instinctively see our intuitions as intu
itions", we just see them as the world. \hen you look at a green
cup, you don't think oI yourselI as seeing a picture reconstructed in
your visual cortexalthough that is what you are seeingyou just
see a green cup. You think, \hy, look, this cup is green," not,
The picture in my visual cortex oI this cup is green."
And in the same way, when people argue over whether the
Ialling tree makes a sound, or whether Pluto is a planet, they don't
see themselves as arguing over whether a categorization should be
active in their neural networks. It seems like either the tree makes
a sound, or not.
\e know where Pluto is, and where it's going, we know Pluto's
shape, and Pluto's massbut is it a planet: And yes, there were
people who said this was a Iight over deIinitionsbut even that is
a etwork z sort oI perspective, because you're arguing about how
the central unit ought to be wired up. II you were a mind construct
ed along the lines oI etwork , you wouldn't say It depends on
how you deIine 'planet'," you would just say, Civen that we know
Pluto's orbit and shape and mass, there is no question leIt to ask."
Or, rather, that's how it would feelit would feel like there was no
question leItiI you were a mind constructed along the lines oI
etwork .
BeIore you can question your intuitions, you have to realize that
what your mind's eye is looking at is an intuitionsome cognitive
algorithm, as seen Irom the insiderather than a direct perception
oI the \ay Things Really Are.
People cling to their intuitions
;
, I think, not so much because
they believe their cognitive algorithms are perIectly reliable, but
because they can't see their intuitions as the way their cognitive algo-
rithms happen to look from the inside.
And so everything you try to say about how the native cognitive
algorithm goes astray, ends up being contrasted to their direct per
ception oI the \ay Things Really Areand discarded as obviously
wrong.
;. http://lesswrong.com/lw/n1/allais_malaise/
HOW AN ALGORITHM FEELS FROM INSIDE 227
11. Disputing Definitions
Followup to: How An Algorithm Feels From Inside

z
I have watched more than one conversationeven conversa
tions supposedly about cognitive sciencego the route oI disputing
over deIinitions. Taking the classic example to be II a tree Ialls in
a Iorest, and no one hears it, does it make a sound:", the dispute oI
ten Iollows a course like this:
If a tree falls in the forest, and no one hears it, does it make a
sound?
Albert: OI course it does. \hat kind oI silly question
is that: Lvery time I've listened to a tree Iall, it made a
sound, so I'll guess that other trees Ialling also make
sounds. I don't believe the world changes around when
I'm not looking."
Barry: \ait a minute. II no one hears it, how can it be
a sound:"
In this example, Barry is arguing with Albert because oI a gen
uinely diIIerent intuition about what constitutes a sound. But
there's more than one way the Standard Lispute can start. Barry
could have a motive Ior rejecting Albert's conclusion. Or Barry
could be a skeptic who, upon hearing Albert's argument, reIlexively
scrutinized
it Ior possible logical Ilaws, and then, on Iinding a

counterargument, automatically accepted
it without applying a
second layer oI search Ior a countercounterargument, thereby ar
guing himselI into the opposite position. This doesn't require that
Barry's prior intuitionthe intuition Barry would have had, iI we'd
asked him beIore Albert spokehave diIIered Irom Albert's.
\ell, iI Barry didn't have a diIIering intuition beIore, he sure
has one now.
. http://lesswrong.com/lw/np/disputing_definitions/
z. Page zz, 'How An Algorithm Feels From Inside'.
. Page ,o,, 'Knowing About Biases Can Hurt People'.
. Page ,zj, 'Motivated Stopping and Motivated Continuation'.
Albert: \hat do you mean, there's no sound: The
tree's roots snap, the trunk comes crashing down and
hits the ground. This generates vibrations that travel
through the ground and the air. That's where the energy
oI the Iall goes, into heat and sound. Are you saying that
iI people leave the Iorest, the tree violates conservation
oI energy:"
Barry: But no one hears anything. II there are no
humans in the Iorest, or, Ior the sake oI argument,
anything else with a complex nervous system capable oI
'hearing', then no one hears a sound."
Albert and Barry recruit arguments that feel like support Ior
their respective positions, describing in more detail the thoughts
that caused their sound"detectors
,
to Iire or stay silent. But so Iar
the conversation has still Iocused on the Iorest, rather than deIini
tions. And note that they don't actually disagree on anything that
happens in the Iorest.
Albert: This is the dumbest argument I've ever been in.
You're a niddlewicking Iallumphing pickleplumber."
Barry: Yeah: \ell, you look like your Iace caught on
Iire and someone put it out with a shovel."
Insult has been proIIered and accepted, now neither party can
back down without losing Iace. Technically, this isn't part oI the
argument, as rationalists account such things, but it's such an impor
tant part oI the Standard Lispute that I'm including it anyway.
Albert: The tree produces acoustic vibrations. By
deIinition, that is a sound."
Barry: o one hears anything. By deIinition, that is not
a sound."
The argument starts shiIting to Iocus on deIinitions. \henever
you Ieel tempted to say the words by deIinition" in an argument
,. Page zz, 'How An Algorithm Feels From Inside'.
DISPUTING DEFINITIONS 229
that is not literally about pure mathematics, remember that any
thing which is true by deIinition" is true in all possible worlds
6
, and
so observing its truth can never constrain
;
which world you live in.
Albert: My computer's microphone can record a sound
without anyone being around to hear it, store it as a Iile,
and it's called a 'sound Iile'. And what's stored in the Iile
is the pattern oI vibrations in air, not the pattern oI
neural Iirings in anyone's brain. 'Sound' means a pattern
oI vibrations."
Albert deploys an argument that feels like support Ior the word
sound" having a particular meaning. This is a diIIerent kind oI ques
tion Irom whether acoustic vibrations take place in a Iorestbut
the shiIt usually passes unnoticed.
Barry: Oh, yeah: Let's just see iI the dictionary agrees
with you."
There's a lot oI things I could be curious about in the Ialling
tree scenario. I could go into the Iorest and look at trees, or learn
how to derive the wave equation Ior changes oI air pressure, or ex
amine the anatomy oI an ear, or study the neuroanatomy oI the
auditory cortex. Instead oI doing any oI these things, I am to
consult a dictionary, apparently. \hy: Are the editors oI the dic
tionary expert botanists, expert physicists, expert neuroscientists:
Looking in an encyclopedia might make sense, but why a dictionary?
Albert: Hah! LeIinition zc in Merriam\ebster:
'Sound: Mechanical radiant energy that is transmitted by
longitudinal pressure waves in a material medium (as
air).'"
Barry: Hah! LeIinition zb in Merriam\ebster:
'Sound: The sensation perceived by the sense oI
hearing.'"
6. Page j, 'The Parable oI Hemlock'.
;. Page 8, 'Fake Lxplanations'.
Albert and Barry, chorus: Consarned dictionary! This
doesn't help at all!"
Lictionary editors are historians oI usage, not legislators oI lan
guage. Lictionary editors Iind words in current usage, then write
down the words next to (a small part oI
8
) what people seem to mean
by them. II there's more than one usage, the editors write down
more than one deIinition.
Albert: Look, suppose that I leIt a microphone in the
Iorest and recorded the pattern oI the acoustic vibrations
oI the tree Ialling. II I played that back to someone,
they'd call it a 'sound'! That's the common usage! Lon't
go around making up your own wacky deIinitions!"
Barry: One, I can deIine a word any way I like so long
as I use it consistently. Two, the meaning I gave was in
the dictionary. Three, who gave you the right to decide
what is or isn't common usage:"
There's quite a lot oI rationality errors in the Standard Lispute.
Some oI them I've already covered, and some oI them I've yet to
cover, likewise the remedies.
But Ior now, I would just like to point outin a mournIul sort
oI waythat Albert and Barry seem to agree on virtually every ques
tion oI what is actually going on inside the Iorest, and yet it doesn't
seem to generate any Ieeling oI agreement.
Arguing about deIinitions is a garden path, people wouldn't go
down the path iI they saw at the outset where it led. II you asked
Albert (Barry) why he's still arguing, he'd probably say something
like: Barry (Albert) is trying to sneak in his own deIinition oI
'sound', the scurvey scoundrel, to support his ridiculous point, and
I'm here to deIend the standard deIinition."
But suppose I went back in time to beIore the start oI the argu
ment:
8. Page zoo, 'Lxtensions and Intensions'.
(Eliezer appears from nowhere in a peculiar conveyance that
looks just like the time machine from the original The Time
Machine movie.)
Barry: Cosh! A time traveler!"
Lliezer: I am a traveler Irom the Iuture! Hear my
words! I have traveled Iar into the pastaround IiIteen
minutes"
Albert: FiIteen minutes?"
Lliezer: to bring you this message!"
(There is a pause of mixed confusion and expectancy.)
Lliezer: Lo you think that 'sound' should be deIined to
require both acoustic vibrations (pressure waves in air)
and also auditory experiences (someone to listen to the
sound), or should 'sound' be deIined as meaning only
acoustic vibrations, or only auditory experience:"
Barry: You went back in time to ask us that?"
Lliezer: My purposes are my own! Answer!"
Albert: \ell. I don't see why it would matter. You
can pick any deIinition so long as you use it consistently."
Barry: Flip a coin. Lr, Ilip a coin twice."
Lliezer: Personally I'd say that iI the issue arises, both
sides should switch to describing the event in
unambiguous lowerlevel constituents, like acoustic
vibrations or auditory experiences. Or each side could
designate a new word, like 'alberzle' and 'bargulum', to
use Ior what they respectively used to call 'sound', and
then both sides could use the new words consistently.
That way neither side has to back down or lose Iace, but
they can still communicate. And oI course you should
try to keep track, at all times, oI some testable
proposition that the argument is actually about. Loes
that sound right to you:"
Albert: I guess."
Barry: \hy are we talking about this:"
Lliezer: To preserve your Iriendship against a
contingency you will, now, never know. For the Iuture
has already changed!"
(Eliezer and the machine vanish in a puff of smoke.)
Barry: \here were we again:"
Albert: Oh, yeah: II a tree Ialls in the Iorest, and no
one hears it, does it make a sound:"
Barry: It makes an alberzle but not a bargulum. \hat's
the next question:"
This remedy doesn't destroy every dispute over categorizations.
But it destroys a substantial Iraction.
12. Feel the Meaning
Followup to: Lisputing LeIinitions

z
\hen I hear someone say, Oh, look, a butterIly," the spoken
phonemes butterIly" enter my ear and vibrate on my ear drum, be
ing transmitted to the cochlea, tickling auditory nerves that trans
mit activation spikes to the auditory cortex, where phoneme pro
cessing begins, along with recognition oI words, and reconstruction
oI syntax (a by no means serial process), and all manner oI other
complications.
But at the end oI the day, or rather, at the end oI the second,
I am primed to look where my Iriend is pointing and see a visual
pattern that I will recognize as a butterIly, and I would be quite sur
prised to see a wolI instead.
My Iriend looks at a butterIly, his throat vibrates and lips move,
the pressure waves travel invisibly through the air, my ear hears
and my nerves transduce and my brain reconstructs, and lo and be
hold, I know what my Iriend is looking at. Isn't that marvelous: II
we didn't know about the pressure waves in the air, it would be a
tremendous discovery in all the newspapers: Humans are telepath
ic! Human brains can transIer thoughts to each other!
\ell, we are telepathic, in Iact, but magic isn't exciting when it's
merely real, and all your Iriends can do it too
.
Think telepathy is simple: Try building a computer that will
be telepathic with you. Telepathy, or language", or whatever you
want to call our partial thought transIer ability, is more complicated
than it looks.
But it would be quite inconvenient to go around thinking, ow
I shall partially transduce some Ieatures oI my thoughts into a linear
sequence oI phonemes which will invoke similar thoughts in my
conversational partner."
So the brain hides the complexityor rather, never represents
it in the Iirst placewhich leads people to think some peculiar
thoughts about words.
. http://lesswrong.com/lw/nq/feel_the_meaning/
z. Page zz8, 'Lisputing LeIinitions'.
. Page ;;, 'Science" as CuriosityStopper'.
As I remarked earlier
, when a large yellow striped object leaps

at me, I think Yikes! A tiger!" not Hm. objects with the proper
ties oI largeness, yellowness, and stripedness have previously oIten
possessed the properties 'hungry' and 'dangerous', and thereIore, al
though it is not logically necessary, auughhhh CR\CH CR\CH
C\LP."
Similarly, when someone shouts Yikes! A tiger!", natural selec
tion would not Iavor an organism that thought, Hm. I have just
heard the syllables 'Tie' and 'Crr' which my Iellow tribe members
associate with their internal analogues oI my own tiger concept, and
which they are more likely to utter iI they see an object they catego
rize as aiiieeee CR\CH CR\CH help its got my arm CR\CH
C\LP".
Blegg Con
sidering
this as a
design
con
straint on
the hu
man
cognitive
architec
ture
,
, you
wouldn't
want any
extra
steps be
tween
when
your au
ditory
cortex
recog
nizes the syllables tiger", and when the tiger concept gets
activated.
. Page j;, '\ords as Hidden InIerences'.
FEEL THE MEANING 235
Coing back to the parable oI bleggs and rubes
6
, and the cen
tralized network
;
that categorizes quickly and cheaply, you might
visualize a direct connection running Irom the unit that recognizes
the syllable blegg", to the unit at the center oI the blegg network.
The central unit, the blegg concept, gets activated almost as soon as
you hear Susan the Senior Sorter say Blegg!"
Or, Ior purposes oI talkingwhich also shouldn't take eonsas
soon as you see a blue eggshaped thing and the central blegg unit
Iires, you holler Blegg!" to Susan.
And what that algorithm Ieels like Irom inside
8
is that the label,
and the concept, are very nearly identified; the meaning feels like an
intrinsic property oI the word itselI.
The cognoscenti will recognize this as yet another case oI L.
T. ]aynes's Mind Projection Fallacy". It Ieels like a word has a
meaning, as a property oI the word itselI, just like how redness is a
property oI a red apple, or mysteriousness is a property oI a myste
rious phenomenon
j
.
Indeed, on most occasions, the brain will not distinguish at all
between the word and the meaningonly bothering to separate the
two while learning a new language, perhaps. And even then, you'll
see Susan pointing to a blue eggshaped thing and saying Blegg!",
and you'll think, I wonder what blegg means, and not, I wonder what
mental category Susan associates to the auditory label blegg.
Consider, in this light, the part oI the Standard Lispute oI LeI
initions
o
where the two parties argue about what the word sound"
really meansthe same way they might argue whether a particular
apple is really red or green:
Albert: My computer's microphone can record a sound
without anyone being around to hear it, store it as a Iile,
and it's called a 'sound Iile'. And what's stored in the Iile
is the pattern oI vibrations in air, not the pattern oI
neural Iirings in anyone's brain. 'Sound' means a pattern
oI vibrations."
6. Page z, 'Lisguised Queries'.
;. Page z;, 'eural Categories'.
8. Page zz, 'How An Algorithm Feels From Inside'.
j. Page ,, 'Mysterious Answers to Mysterious Questions'.
o. Page zz8, 'Lisputing LeIinitions'.
Barry: Oh, yeah: Let's just see iI the dictionary agrees
with you."
Albert Ieels intuitively that the word sound" has a meaning and
that the meaning is acoustic vibrations. ]ust as Albert Ieels that a
tree Ialling in the Iorest makes a sound (rather than causing an event
that matches the sound category).
Barry likewise feels that:
sound.meaning == auditory experiences
forest.sound == false
Rather than:
myBrain.FindConcept(sound) ==
concept_AuditoryExperience
concept_AuditoryExperience.match(forest)
== false
\hich is closer to what's really going on, but humans have not
evolved to know this, anymore than humans instinctively know the
brain is made oI neurons.
Albert and Barry's conIlicting intuitions provide the Iuel Ior
continuing the argument in the phase oI arguing over what the word
sound" meanswhich feels like arguing over a Iact like any other
Iact, like arguing over whether the sky is blue or green.
You may not even notice that anything has gone astray, until
you try to perIorm the rationalist ritual oI stating a testable ex
periment
whose result depends on the Iacts you're so heatedly

disputing.
FEEL THE MEANING 237
13. The Argument from Common Usage
Followup to: Feel the Meaning

z
Part oI the Standard LeIinitional Lispute
runs as Iollows:
Albert: Look, suppose that I leIt a microphone in the
Iorest and recorded the pattern oI the acoustic vibrations
oI the tree Ialling. II I played that back to someone,
they'd call it a 'sound'! That's the common usage! Lon't
go around making up your own wacky deIinitions!"
Barry: One, I can deIine a word any way I like so long
as I use it consistently. Two, the meaning I gave was in
the dictionary. Three, who gave you the right to decide
what is or isn't common usage:"
ot all deIinitional disputes progress as Iar as recognizing the
notion oI common usage. More oIten, I think, someone picks
up a dictionary because they believe that words have meanings
,
and the dictionary IaithIully records what this meaning is. Some
people even seem to believe that the dictionary determines the mean
ingthat the dictionary editors are the Legislators oI Language.
Maybe because back in elementary school, their authorityteacher
,
said that they had to obey the dictionary, that it was a mandatory
rule rather than an optional one:
Lictionary editors read what other people write, and record
what the words seem to mean, they are historians. The OxIord Ln
glish Lictionary may be comprehensive, but never authoritative.
But surely there is a social imperative to use words in a com
monly understood way: Loes not our human telepathy, our valu
able power oI language, rely on mutual coordination to work: Per
haps we should voluntarily treat dictionary editors as supreme ar
biterseven iI they preIer to think oI themselves as historiansin
. http://lesswrong.com/lw/nr/the_argument_from_common_usage/
z. Page z, 'Feel the Meaning'.
. Page zz8, 'Lisputing LeIinitions'.
. Page z, 'Feel the Meaning'.
,. Page 6o, 'Absolute Authority'.
order to maintain the quiet cooperation on which all speech de
pends.
The phrase authoritative dictionary" is almost never used cor
rectly, an example oI proper usage being the Authoritative Lictio
nary oI ILLL Standards. The ILLL is a body oI voting members
who have a proIessional need Ior exact agreement on terms and deI
initions, and so the Authoritative Lictionary oI ILLL Standards is
actual, negotiated legislation, which exerts whatever authority one
regards as residing in the ILLL.
In everyday liIe, shared language usually does not arise Irom a
deliberate agreement, as oI the ILLL. It's more a matter oI in
Iection, as words are invented and diIIuse through the culture. (A
meme", one might say, Iollowing Richard Lawkins thirty years
agobut you already know what I mean, and iI not, you can look it
up on Coogle, and then you too will have been inIected.)
Yet as the example oI the ILLL shows, agreement on language
can also be a cooperatively established public good. II you and I
wish to undergo an exchange oI thoughts via language, the human
telepathy, then it is in our mutual interest that we use the same word
Ior similar conceptspreIerably, concepts similar to the limit oI
resolution in our brain's representation thereoIeven though we
have no obvious mutual interest in using any particular word Ior a
concept.
\e have no obvious mutual interest in using the word oto" to
mean sound, or sound" to mean oto, but we have a mutual interest
in using the same word, whichever word it happens to be. (PreIer
ably, words we use Irequently should be short, but let's not get into
inIormation theory just yet.)
But, while we have a mutual interest, it is not strictly necessary
that you and I use the similar labels internally, it is only convenient.
II I know that, to you, oto" means soundthat is, you associate
oto" to a concept very similar to the one I associate to
sound"then I can say Paper crumpling makes a crackling oto."
It requires extra thought, but I can do it iI I want.
Similarly, iI you say \hat is the walkingstick oI a bowling ball
dropping on the Iloor:" and I know which concept you associate
with the syllables walkingstick", then I can Iigure out what you
mean. It may require some thought, and give me pause, because I
THE ARGUMENT FROM COMMON USAGE 239
ordinarily associate walkingstick" with a diIIerent concept. But I
can do it just Iine.
\hen humans really want to communicate with each other,
we're hard to stop! II we're stuck on a deserted island with no com
mon language, we'll take up sticks and draw pictures in sand.
Albert's appeal to the Argument Irom Common \sage assumes
that agreement on language is a cooperatively established public
good. Yet Albert assumes this Ior the sole purpose oI rhetorically
accusing Barry oI breaking the agreement, and endangering the
public good. ow the Iallingtree argument has gone all the way
Irom botany to semantics to politics, and so Barry responds by chal
lenging Albert Ior the authority to deIine the word.
A rationalist, with the discipline oI hugging the query
6
active,
would notice that the conversation had gone rather Iar astray.
Oh, dear reader, is it all really necessary: Albert knows what
Barry means by sound". Barry knows what Albert means by
sound". Both Albert and Barry have access to words, such as
acoustic vibrations" or auditory experience", which they already
associate to the same concepts, and which can describe events in
the Iorest without ambiguity. II they were stuck on a deserted is
land, trying to communicate with each other, their work would be
done.
\hen both sides know what the other side wants to say, and
both sides accuse the other side oI deIecting Irom common usage",
then whatever it is they are about, it is clearly not working out a
way to communicate with each other. But this is the whole beneIit that
common usage provides in the Iirst place.
\hy would you argue about the meaning oI a word, two sides
trying to wrest it back and Iorth: II it's just a namespace conIlict
that has gotten blown out oI proportion, and nothing more is at
stake, then the two sides need merely generate two new words and
use them consistently
;
.
Yet oIten categorizations Iunction as hidden inIerences
8
and
disguised queries
j
. Is atheism a religion"
o
: II someone is arguing
6. Page , 'Hug the Query'.
;. Page zz8, 'Lisputing LeIinitions'.
8. Page j;, '\ords as Hidden InIerences'.
j. Page z, 'Lisguised Queries'.
o. Page jz, '\ncritical Supercriticality'.
that the reasoning methods used in atheism are on a par with the
reasoning methods used in ]udaism, or that atheism is on a par with
Islam in terms oI causally engendering violence, then they have a
clear argumentative stake in lumping it all together into an indis
tinct gray blur oI Iaith
.
Or consider the Iight to blend together blacks and whites as
people". This would not be a time to generate two wordswhat's
at stake is exactly the idea that you shouldn't draw a moral distinc
tion.
But once any empirical proposition is at stake, or any moral
proposition, you can no longer appeal to common usage.
II the question is how to cluster together similar things
z
Ior
purposes oI inIerence, empirical predictions will depend on the an
swer, which means that deIinitions can be wrong. A conIlict oI
predictions cannot be settled by an opinion poll.
II you want to know whether atheism should be clustered with
supernaturalist religions Ior purposes oI some particular empirical
inIerence, the dictionary can't answer you.
II you want to know whether blacks are people, the dictionary
can't answer you.
II everyone believes that the red light in the sky is Mars the Cod
oI \ar, the dictionary will deIine Mars" as the Cod oI \ar
. II
everyone believes that Iire is the release oI phlogiston, the dictio
nary will deIine Iire" as the release oI phlogiston.
There is an art to using words, even when deIinitions are not
literally true or Ialse, they are oIten wiser or more Ioolish. Lictio
naries are mere histories oI past usage, iI you treat them as supreme
arbiters oI meaning, it binds you to the wisdom oI the past
, Ior
bidding you to do better
,
.
Though do take care to ensure (iI you must depart Irom the wis
dom oI the past) that people can Iigure out what you're trying to
swim.
. Page ,jj, 'The Fallacy oI Cray'.
z. Page zo, 'Similarity Clusters'.
. Page zoo, 'Lxtensions and Intensions'.
. Page oj, 'Cuardians oI the Truth'.
,. http://lesswrong.com/lw/h9/tsuyoku_vs_the_egalitarian_instinct/
THE ARGUMENT FROM COMMON USAGE 241
14. Empty Labels
Followup to: The Argument Irom Common \sage

z
Consider (yet again) the Aristotelian idea oI categories. Let's
say that there's some object with properties A, B, C, L, and L, or at
least it looks Lish.
Fred: You mean that thing over there is blue, round,
Iuzzy, and"
Me: In Aristotelian logic, it's not supposed to make a
diIIerence what the properties are, or what I call them.
That's why I'm just using the letters."
ext, I invent the Aristotelian category zawa", which describes
those objects, all those objects, and only those objects, which have
properties A, C, and L.
Me: Object is zawa, B, and L."
Fred: And it's blueI mean, Atoo, right:"
Me: That's implied when I say it's zawa."
Fred: Still, I'd like you to say it explicitly."
Me: Okay. Object is A, B, zawa, and L."
Then I add another word, yokie", which describes all and only
objects that are B and L, and the word xippo", which describes all
and only objects which are L but not L.
Me: Object is zawa and yokie, but not xippo."
Fred: \ait, is it luminescent: I mean, is it L:"
Me: Yes. That is the only possibility on the
inIormation given."
Fred: I'd rather you spelled it out."
Me: Fine: Object is A, zawa, B, yokie, C, L, L, and
not xippo."
Fred: Amazing! You can tell all that just by looking:"
. http://lesswrong.com/lw/ns/empty_labels/
z. Page z8, 'The Argument Irom Common \sage'.
Impressive, isn't it: Let's invent even more new words: Bolo"
is A, C, and yokie, mun" is A, C, and xippo, and merlacdonian" is
bolo and mun.
Pointlessly conIusing: I think so too. Let's replace the labels
with the deIinitions:
Zawa, B, and L" becomes |A, C, L], B, L
Bolo and A" becomes |A, C, |B, L]], A
Merlacdonian" becomes |A, C, |B, L]], |A, C, |L, -L]]
And the thing to remember about the Aristotelian idea oI cate
gories is that |A, C, L] is the entire inIormation oI zawa". It's not
just that I can vary the label, but that I can get along just Iine with
out any label at allthe rules Ior Aristotelian classes work purely
on structures like |A, C, L]. To call one oI these structures zawa",
or attach any other label to it, is a human convenience (or inconve
nience) which makes not the slightest diIIerence to the Aristotelian
rules.
Let's say that human" is to be deIined as a mortal Ieatherless
biped. Then the classic syllogism
would have the Iorm:

All |mortal, -Ieathers, bipedal] are mortal.
Socrates is a |mortal, -Ieathers, bipedal].
ThereIore, Socrates is mortal.
The Ieat oI reasoning looks a lot less impressive now, doesn't it:
Here the illusion of inference comes Irom the labels, which con
ceal the premises, and pretend to novelty in the conclusion. Re
placing labels with deIinitions reveals the illusion, making visible
the tautology's empirical unhelpIulness
. You can never say that

Socrates is a |mortal, -Ieathers, biped] until you have observed him
to be mortal.
There's an idea, which you may have noticed I hate, that you
can deIine a word any way you like". This idea came Irom the Aris
totelian notion oI categories, since, iI you Iollow the Aristotelian
rules exactly and without flawwhich humans never do
,
, Aristo
,. Page j;, '\ords as Hidden InIerences'.
EMPTY LABELS 243
tle knew perIectly well that Socrates was human, even though that
wasn't justiIied under his rulesbut, if some imaginary nonhuman
entity were to Iollow the rules exactly, they would never arrive at
a contradiction. They wouldn't arrive at much oI anything: they
couldn't say that Socrates is a |mortal, -Ieathers, biped] until they
observed him to be mortal.
But it's not so much that labels are arbitrary in the Aristotelian
system, as that the Aristotelian system works Iine without any labels
at allit cranks out exactly the same stream oI tautologies, they just
look a lot less impressive. The labels are only there to create the il-
lusion oI inIerence.
So iI you're going to have an Aristotelian proverb at all, the
proverb should be, not I can deIine a word any way I like," nor
even, LeIining a word never has any consequences," but rather,
LeIinitions don't need words."
15. Taboo Your Words
Followup to: Lmpty Labels

z
In the game Taboo (by Hasbro), the objective is Ior a player to
have their partner guess a word written on a card, without using
that word or Iive additional words listed on the card. For example,
you might have to get your partner to say baseball" without using
the words sport", bat", hit", pitch", base" or oI course base
ball".
The existence oI this game surprised me, when I discovered it.
\hy wouldn't you just say An artiIicial group conIlict in which you
use a long wooden cylinder to whack a thrown spheroid, and then
run between Iour saIe positions":
But then, by the time I discovered the game, I'd already been
practicing it Ior yearsalbeit with a diIIerent purpose.
Yesterday we saw how replacing terms with deIinitions could
reveal the empirical unproductivity
oI the classical Aristotelian syl

logism:
All |mortal, -Ieathers, biped] are mortal,
Socrates is a |mortal, -Ieathers, biped],
ThereIore Socrates is mortal.
But the principle applies much more broadly:
Albert: A tree Ialling in a deserted Iorest makes a
sound."
Barry: A tree Ialling in a deserted Iorest does not make
a sound."
Clearly, since one says sound" and one says -sound", we must
have a contradiction, right: But suppose that they both dereIer
ence their pointers beIore speaking:
Albert: A tree Ialling in a deserted Iorest matches
|membership test: this event generates acoustic
. http://lesswrong.com/lw/nu/taboo_your_words/
z. Page zz, 'Lmpty Labels'.
vibrations]."
Barry: A tree Ialling in a deserted Iorest does not match
|membership test: this event generates auditory
experiences]."
ow there is no longer an apparent collisionall they had to
do was prohibit themselves Irom using the word sound. II acoustic
vibrations" came into dispute, we would just play Taboo again and
say pressure waves in a material medium", iI necessary we would
play Taboo again on the word wave
" and replace it with the wave

equation. (Play Taboo on auditory experience" and you get That
Iorm oI sensory processing, within the human brain, which takes as
input a linear time series oI Irequency mixes.")
But suppose, on the other hand, that Albert and Barry were to
have the argument:
Albert: Socrates matches the concept |membership
test: this person will die aIter drinking hemlock]."
Barry: Socrates matches the concept |membership test:
this person will not die aIter drinking hemlock]."
ow Albert and Barry have a substantive clash oI expectations,
a diIIerence in what they anticipate seeing aIter Socrates drinks
hemlock. But they might not notice this, iI they happened to use
the same word human" Ior their diIIerent concepts.
You get a very diIIerent picture oI what people agree or disagree
about, depending on whether you take a label'seyeview (Albert
says sound" and Barry says not sound", so they must disagree) or
taking the test'seyeview (Albert's membership test is acoustic vi
brations, Barry's is auditory experience).
Cet together a pack oI soi-disant Iuturists and ask them iI they
believe we'll have ArtiIicial Intelligence in thirty years, and I would
guess that at least halI oI them will say yes. II you leave it at that,
they'll shake hands and congratulate themselves on their consen
sus. But make the term ArtiIicial Intelligence" taboo, and ask
them to describe what they expect to see, without ever using words
like computers" or think", and you might Iind quite a conIlict oI
expectations hiding under that Ieatureless standard word. Likewise
that other term
,
. And see also Shane Legg's compilation oI ; deIi
nitions oI intelligence"
6
.
The illusion oI unity across religions can be dispelled by making
the term Cod" taboo, and asking them to say what it is they believe
in, or making the word Iaith" taboo, and asking them why they be
lieve it. Though mostly they won't be able to answer at all, because
it is mostly proIession
;
in the Iirst place, and you cannot cognitively
zoom in on an audio recording.
\hen you Iind yourselI in philosophical diIIiculties, the Iirst
line oI deIense is not to deIine your problematic terms, but to see
whether you can think without using those terms at all. Or any oI
their short synonyms. And be careIul not to let yourselI invent a
new word to use instead. Lescribe outward observables and interi
or mechanisms, don't use a single handle, whatever that handle may
be.
Albert says that people have Iree will". Barry says that people
don't have Iree will". \ell, that will certainly generate an apparent
conIlict. Most philosophers would advise Albert and Barry to try
to deIine exactly what they mean by Iree will", on which topic they
will certainly be able to discourse at great length. I would advise
Albert and Barry to describe what it is that they think people do, or
do not have, without using the phrase Iree will" at all. (II you want
to try this at home, you should also avoid the words choose", act",
decide", determined", responsible", or any oI their synonyms.)
This is one oI the nonstandard tools in my toolbox, and in my
humble opinion, it works way way better than the standard one. It
also requires more eIIort to use, you get what you pay Ior.
,. http://www.singinst.org/blog/2007/09/30/three-major-singularity-schools/
6. http://arxiv.org/abs/0706.3639
;. Page o8, 'BelieI in BelieI'.
TABOO YOUR WORDS 247
16. Replace the Symbol with the Substance
Continuation of: Taboo Your \ords

z
Followup to: Original Seeing
, Lost Purposes
\hat does it take toas in yesterday's examplesee a baseball

game" as An artiIicial group conIlict in which you use a long wood
en cylinder to whack a thrown spheroid, and then run between Iour
saIe positions": \hat does it take to play the rationalist version oI
Taboo, in which the goal is not to Iind a synonym that isn't on the
card, but to Iind a way oI describing without the standard concept
handle:
You have to visualize. You have to make your mind's eye see
the details, as though looking Ior the Iirst time. You have to per
Iorm an Original Seeing
,
.
Is that a bat": o, it's a long, round, tapering, wooden rod,
narrowing at one end so that a human can grasp and swing it.
Is that a ball": o, it's a leathercovered spheroid with a sym
metrical stitching pattern, hard but not metalhard, which someone
can grasp and throw, or strike with the wooden rod, or catch.
Are those bases": o, they're Iixed positions on a game Iield,
that players try to run to as quickly as possible because oI their saIe
ty within the game's artiIicial rules.
The chieI obstacle to perIorming an original seeing is that your
mind already has a nice neat summary, a nice little easytouse con
cept handle. Like the word baseball", or bat", or base". It takes
an eIIort to stop your mind Irom sliding down the Iamiliar path, the
easy path, the path oI least resistance, where the small Ieatureless
word rushes in and obliterates the details you're trying to see. A
word itselI can have the destructive Iorce oI cliche
6
, a word itselI
can carry the poison oI a cached thought
;
.
. http://lesswrong.com/lw/nv/replace_the_symbol_with_the_substance/
. Page ,;, 'Original Seeing'.
. http://lesswrong.com/lw/le/lost_purposes/
,. Page ,;, 'Original Seeing'.
6. Page ,, 'Rationality and the Lnglish Language'.
;. Page ,, 'Cached Thoughts'.
Playing the game oI Taboo
8
being able to describe without
using the standard pointer/label/handleis one oI the fundamental
rationalist capacities. It occupies the same primordial level as the
habit oI constantly asking \hy:" or \hat does this belieI make
me anticipate:"
The art is closely related to:
Pragmatism, because seeing in this way oIten gives you a
much closer connection to anticipated experience
j
, rather
than propositional belieI
o
,
Reductionism, because seeing in this way oIten Iorces you
to drop down to a lower level oI organization, look at the
parts instead oI your eye skipping over the whole,
Hugging the query
, because words oIten distract you

Irom the question you really want to ask,
Avoiding cached thoughts
z
, which will rush in using
standard words, so you can block them by tabooing
standard words,
The writer's rule oI Show, don't tell!", which has power
among rationalists,
And not losing sight oI your original purpose
.
How could tabooing a word help you keep your purpose:
From Lost Purposes
:
As you read this, some young man or woman is sitting at
a desk in a university, earnestly studying material they
have no intention oI ever using, and no interest in
knowing Ior its own sake. They want a highpaying job,
and the highpaying job requires a piece oI paper, and the
piece oI paper requires a previous master's degree, and
the master's degree requires a bachelor's degree, and the
university that grants the bachelor's degree requires you
to take a class in zthcentury knitting patterns to
graduate. So they diligently study, intending to Iorget it
8. Page z,, 'Taboo Your \ords'.
j. Page o,, 'Making BelieIs Pay Rent (in Anticipated Lxperiences)'.
o. Page o8, 'BelieI in BelieI'.
. Page , 'Hug the Query'.
z. Page ,, 'Cached Thoughts'.
REPLACE THE SYMBOL WITH THE SUBSTANCE 249
all the moment the Iinal exam is administered, but still
seriously working away, because they want that piece oI
paper.
\hy are you going to school": To get an education" ending
in a degree". Blank out the Iorbidden words and all their obvious
synonyms, visualize the actual details, and you're much more likely
to notice that school" currently seems to consist oI sitting next
to bored teenagers listening to material you already know, that a
degree" is a piece oI paper with some writing on it, and that edu
cation" is Iorgetting the material as soon as you're tested on it.
Leaky generalizations
,
oIten maniIest through categorizations:
People who actually learn in classrooms are categorized as getting
an education", so getting an education" must be good, but then
anyone who actually shows up at a college will also match against
the concept getting an education", whether or not they learn.
Students who understand math will do well on tests, but iI
you require schools to produce good test scores, they'll spend all
their time teaching to the test. A mental category, that imperIectly
matches your goal, can produce the same kind oI incentive Iailure
internally. You want to learn, so you need an education", and then
as long as you're getting anything that matches against the category
education", you may not notice whether you're learning or not. Or
you'll notice, but you won't realize you've lost sight oI your original
purpose, because you're getting an education" and that's how you
mentally described your goal.
To categorize is to throw away inIormation. II you're told that
a Ialling tree makes a sound", you don't know what the actual
sound is, you haven't actually heard the tree Ialling. II a coin lands
heads", you don't know its radial orientation. A blue eggshaped
thing may be a blegg", but what iI the exact egg shape varies, or the
exact shade oI blue: You want to use categories to throw away ir
relevant inIormation, to siIt gold Irom dust, but oIten the standard
categorization ends up throwing out relevant inIormation too. And
when you end up in that sort oI mental trouble, the Iirst and most
obvious solution is to play Taboo.
For example: Play Taboo" is itselI a leaky generalization. Has
bro's version is not the rationalist version, they only list Iive ad
,. http://lesswrong.com/lw/lc/leaky_generalizations/
ditional banned words on the card, and that's not nearly enough
coverage to exclude thinking in Iamiliar old words. \hat ratio
nalists do would count as playing Tabooit would match against
the play Taboo" conceptbut not everything that counts as play
ing Taboo works to Iorce original seeing. II you just think play
Taboo to Iorce original seeing", you'll start thinking that anything
that counts as playing Taboo must count as original seeing.
The rationalist version isn't a game, which means that you can't
win by trying to be clever and stretching the rules. You have to play
Taboo with a voluntary handicap: Stop yourselI Irom using syn
onyms that aren't on the card. You also have to stop yourselI Irom
inventing a new simple word or phrase that Iunctions as an equiv
alent mental handle to the old one. You are trying to zoom in on
your map, not rename the cities, dereIerence the pointer, not allo
cate a new pointer, see the events as they happen, not rewrite the
cliche in a diIIerent wording.
By visualizing the problem in more detail, you can see the lost
purpose: Lxactly what do you do when you play Taboo": \hat
purpose does each and every part serve:
II you see your activities and situation originally, you will be able
to originally see your goals as well. II you can look with Iresh eyes,
as though Ior the Iirst time, you will see yourselI doing things that
you would never dream oI doing iI they were not habits.
Purpose is lost whenever the substance (learning, knowledge,
health) is displaced by the symbol (a degree, a test score, medical
care). To heal a lost purpose, or a lossy categorization, you must do
the reverse:
Replace the symbol with the substance, replace the signiIier
with the signiIied, replace the property with the membership test,
replace the word with the meaning, replace the label with the
concept, replace the summary with the details, replace the proxy
question with the real question, dereIerence the pointer, drop into
a lower level oI organization, mentally simulate the process instead
oI naming it, zoom in on your map.
The Simple Truth
6
" was generated by an exercise oI this dis
cipline to describe truth" on a lower level oI organization, without
invoking terms like accurate", correct", represent", reIlect", se
6. http://yudkowsky.net/bayes/truth.html
REPLACE THE SYMBOL WITH THE SUBSTANCE 251
mantic", believe", knowledge", map", or real". (And remember
that the goal is not really to play Taboothe word true" appears
in the text, but not to deIine truth. It would get a buzzer in Has
bro's game, but we're not actually playing that game. Ask yourselI
whether the document IulIilled its purpose, not whether it Iollowed
the rules.)
Bayes's Rule itselI describes evidence" in pure math, without
using words like implies", means", supports", proves", or justi
Iies". Set out to define such philosophical terms, and you'll just go
in circles.
And then there's the most important word oI all to Taboo. I've
oIten
;
warned
8
that you should be careIul not to overuse it, or
even avoid the concept
j
in certain cases. ow you know the re
al reason why. It's not a bad subject to think about. But your true
understanding is measured by your ability to describe what you're
doing and why, without using that word or any oI its synonyms.
;. http://yudkowsky.net/virtues/
8. Page z, 'Two Cult Koans'.
j. http://lesswrong.com/lw/nc/newcombs_problem_and_regret_of_rationality/
17. Fallacies of Compression
Followup to: Replace the Symbol with the Substance

z
The map is not the territory," as the saying goes. The only
liIesize, atomically detailed, oo% accurate map oI CaliIornia is
CaliIornia. But CaliIornia has important regularities, such as the
shape oI its highways, that can be described using vastly less inIor
mationnot to mention vastly less physical materialthan it would
take to describe every atom within the state borders. Hence the
other saying: The map is not the territory, but you can't Iold up the
territory and put it in your glove compartment."
A paper map oI CaliIornia, at a scale oI o kilometers to cen
timeter (a million to one), doesn't have room to show the distinct
position oI two Iallen leaves lying a centimeter apart on the side
walk. Lven iI the map tried to show the leaves, the leaves would
appear as the same point on the map, or rather the map would need
a Ieature size oI o nanometers, which is a Iiner resolution than
most book printers handle, not to mention human eyes.
Reality is very largejust the part we can see is billions oI
lightyears across. But your map oI reality is written on a Iew pounds
oI neurons, Iolded up to Iit inside your skull. I don't mean to be in
sulting, but your skull is tiny, comparatively speaking.
Inevitably, then, certain things that are distinct in reality, will
be compressed into the same point on your map.
But what this Ieels like Irom inside
is not that you say, Oh,

look, I'm compressing two things into one point on my map."
\hat it feels like Irom inside is that there is just one thing, and you
are seeing it.
A suIIiciently young child, or a suIIiciently ancient Creek
philosopher, would not know that there were such things as acous
tic vibrations" or auditory experiences". There would just be a
single thing that happened when a tree Iell, a single event called
sound".
. http://lesswrong.com/lw/nw/fallacies_of_compression/
z. Page z8, 'Replace the Symbol with the Substance'.
. Page zz, 'How An Algorithm Feels From Inside'.
To realize that there are two distinct events, underlying one
point on your map, is an essentially scientific challengea big, diIIi
cult scientiIic challenge.
Sometimes Iallacies oI compression result Irom conIusing two
known things under the same labelyou know about acoustic vi
brations, and you know about auditory processing in brains, but you
call them both sound" and so conIuse yourselI. But the more dan
gerous Iallacy oI compression arises Irom having no idea whatsoever
that two distinct entities even exist. There is just one mental Iolder
in the Iiling system, labeled sound", and everything thought about
sound" drops into that one Iolder. It's not that there are two Iold
ers with the same label, there's just a single Iolder. By deIault, the
map is compressed, why would the brain create two mental buckets
where one would serve:
Or think oI a mystery novel in which the detective's critical in
sight is that one oI the suspects has an identical twin. In the course
oI the detective's ordinary work, his job is just to observe that
Carol is wearing red, that she has black hair, that her sandals are
leatherbut all these are facts about Carol. It's easy enough to ques
tion an individual Iact, like \earsRed(Carol) or BlackHair(Carol).
Maybe BlackHair(Carol) is Ialse. Maybe Carol dyes her hair.
Maybe BrownHair(Carol). But it takes a subtler detective to won
der iI the Carol in \earsRed(Carol) and BlackHair(Carol)the
Carol Iile into which his observations dropshould be split into
two Iiles. Maybe there are two Carols, so that the Carol who wore
red is not the same woman as the Carol who had black hair.
Here it is the very act oI creating two diIIerent buckets that is
the stroke oI genius insight. 'Tis easier to question one's Iacts than
one's ontology.
The map oI reality contained in a human brain, unlike a paper
map oI CaliIornia, can expand dynamically when we write down
more detailed descriptions. But what this Ieels like Irom inside
is not so much zooming in on a map, as Iissioning an indivisible
atomtaking one thing (it Ielt like one thing) and splitting it into
two or more things.
OIten this maniIests in the creation oI new words, like acoustic
vibrations" and auditory experiences" instead oI just sound".
Something about creating the new name seems to allocate the new
bucket. The detective is liable to start calling one oI his suspects
Carolz" or the Other Carol" almost as soon as he realizes that
there are two oI them.
But expanding the map isn't always as simple as generating new
city names. It is a stroke oI scientiIic insight to realize that such
things as acoustic vibrations, or auditory experiences, even exist.
The obvious modernday illustration would be words like intel
ligence" or consciousness". Lvery now and then one sees a press
release claiming that a research has explained consciousness" be
cause a team oI neurologists investigated a oHz electrical rhythm
that might have something to do with crossmodality binding oI
sensory inIormation, or because they investigated the reticular acti
vating system that keeps humans awake. That's an extreme exam
ple, and the usual Iailures are more subtle, but they are oI the same
kind. The part oI consciousness" that people Iind most interesting
is reIlectivity, selIawareness, realizing that the person I see in the
mirror is me", that and the hard problem oI subjective experience
as distinguished by Chalmers. \e also label conscious" the state
oI being awake, rather than asleep, in our daily cycle. But they are
all diIIerent concepts going under the same name, and the under
lying phenomena are diIIerent scientiIic puzzles. You can explain
being awake without explaining reIlectivity or subjectivity.
Fallacies oI compression also underlie the baitandswitch tech
nique in philosophyyou argue about consciousness" under one
deIinition (like the ability to think about thinking) and then apply
the conclusions to consciousness" under a diIIerent deIinition (like
subjectivity). OI course it may be that the two are the same thing,
but iI so, genuinely understanding this Iact would require first a con
ceptual split and then a genius stroke oI reuniIication.
Lxpanding your map is (I say again) a scientific challenge: part
oI the art oI science, the skill oI inquiring into the world. (And oI
course you cannot solve a scientiIic challenge by appealing to dictio
naries, nor master a complex skill oI inquiry by saying I can deIine
a word any way I like".) \here you see a single conIusing thing,
with protean and selIcontradictory attributes, it is a good guess
that your map is cramming too much into one pointyou need to
pry it apart and allocate some new buckets. This is not like defining
the single thing you see, but it does oIten Iollow Irom Iiguring out
how to talk about the thing without using a single mental handle.
FALLACIES OF COMPRESSION 255
So the skill oI prying apart the map is linked to the rationalist
version
oI Taboo
,
, and to the wise use oI words, because words oI
ten represent the points on our map, the labels under which we Iile
our propositions and the buckets into which we drop our inIorma
tion. Avoiding a single word, or allocating new ones, is oIten part
oI the skill oI expanding the map.
. Page z8, 'Replace the Symbol with the Substance'.
,. Page z,, 'Taboo Your \ords'.
18. Categorizing Has Consequences
Followup to: Fallacies oI Compression

z
Among the many genetic variations and mutations you carry
in your genome, there are a very Iew alleles you probably
knowincluding those determining your blood type: the presence
or absence oI the A, B, and - antigens. II you receive a blood
transIusion containing an antigen you don't have, it will trigger an
allergic reaction. It was Karl Landsteiner's discovery oI this Iact,
and how to test Ior compatible blood types, that made it possible
to transIuse blood without killing the patient. (jo obel Prize
in Medicine.) Also, iI a mother with blood type A (Ior example)
bears a child with blood type A-, the mother may acquire an allergic
reaction to the - antigen, iI she has another child with blood type
A-, the child will be in danger, unless the mother takes an allergic
suppressant during pregnancy. Thus people learn their blood types
beIore they marry.
Oh, and also: people with blood type A are earnest and creative,
while people with blood type B are wild and cheerIul. People with
type O are agreeable and sociable, while people with type AB are
cool and controlled. (You would think that O would be the absence
oI A and B, while AB would just be A plus B, but no.) All this, ac
cording to the ]apanese blood type theory oI personality
. It would
seem that blood type plays the role in ]apan that astrological signs
play in the \est, right down to blood type horoscopes in the daily
newspaper.
This Iad is especially odd because blood types have never been
mysterious, not in ]apan and not anywhere. \e only know blood
types even exist thanks to Karl Landsteiner. o mystic witch doc
tor, no venerable sorcerer, ever said a word about blood types, there
are no ancient, dusty scrolls to shroud the error in the aura oI an
tiquity
. II the medical proIession claimed tomorrow that it had all

been a colossal hoax, we layIolk would not have one scrap oI evi
dence Irom our unaided senses to contradict them.
. http://lesswrong.com/lw/nx/categorizing_has_consequences/
z. Page z,, 'Fallacies oI Compression'.
. http://en.wikipedia.org/wiki/Japanese_blood_type_theory_of_personality
. Page oj, 'Cuardians oI the Truth'.
There's never been a war between blood types. There's never
even been a political conIlict between blood types. The stereotypes
must have arisen strictly Irom the mere existence oI the labels.
ow, someone is bound to point out that this is a story oI cat
egorizing humans. Loes the same thing happen iI you categorize
plants, or rocks, or oIIice Iurniture: I can't recall reading about
such an experiment, but oI course, that doesn't mean one hasn't
been done
,
. (I'd expect the chieI diIIiculty oI doing such an exper
iment would be Iinding a protocol that didn't mislead the subjects
into thinking that, since the label was given you, it must be sig
niIicant somehow.) So while I don't mean to update on imaginary
evidence, I would predict a positive result Ior the experiment: I
would expect them to Iind that mere labeling had power over all
things, at least in the human imagination.
You can see this in terms oI similarity clusters
6
: once you draw a
boundary around a group, the mind starts trying to harvest similar
ities Irom the group. And unIortunately the human patterndetec
tors seem to operate in such overdrive that we see patterns whether
they're there or not, a weakly negative correlation can be mistaken
Ior a strong positive one with a bit oI selective memory.
You can see this in terms oI neural algorithms
;
: creating a name
Ior a set oI things is like allocating a subnetwork to Iind patterns in
them.
You can see this in terms oI a compression Iallacy
8
: things given
the same name end up dumped into the same mental bucket, blur
ring them together into the same point on the map.
Or you can see this in terms oI the boundless human ability to
make stuII up out oI thin air and believe it because no one can prove
it's wrong
j
. As soon as you name the category, you can start making
up stuII about it. The named thing doesn't have to be perceptible,
it doesn't have to exist, it doesn't even have to be coherent.
And no, it's not just ]apan: Here in the \est, a bloodtype
based diet book called Lat Right Your Type
o
was a bestseller.
,. http://lesswrong.com/lw/kj/no_one_knows_what_science_doesnt_know/
6. Page zo, 'Similarity Clusters'.
;. Page z;, 'eural Categories'.
8. Page z,, 'Fallacies oI Compression'.
j. Page ,o,, 'Knowing About Biases Can Hurt People'.
o. http://en.wikipedia.org/wiki/Blood_type_diet
Any way you look at it, drawing a boundary in thingspace
is not
a neutral act. Maybe a more cleanly designed, more purely Bayesian
AI could ponder an arbitrary class and not be inIluenced by it. But
you, a human, do not have that option. Categories are not static
things in the context oI a human brain, as soon as you actually think
oI them, they exert Iorce on your mind. One more reason not to
believe you can deIine a word any way you like.
CATEGORIZING HAS CONSEQUENCES 259
19. Sneaking in Connotations
Followup to: Categorizing Has Consequences

z
Yesterday, we saw that in ]apan, blood types have taken the
place oI astrologyiI your blood type is AB, Ior example, you're
supposed to be cool and controlled".
So suppose we decided to invent a new word, wiggin", and de-
fined this word to mean people with green eyes and black hair
A greeneyed man with black hair walked into a
restaurant.
"Ha," said Lanny, watching Irom a nearby table, did
you see that: A wiggin just walked into the room.
Bloody wiggins. Commit all sorts oI crimes, they do."
His sister Lrda sighed. You haven't seen him
commit any crimes, have you, Lanny:"
"Lon't need to," Lanny said, producing a dictionary.
See, it says right here in the OxIord Lnglish Lictionary.
'\iggin. () A person with green eyes and black hair.'
He's got green eyes and black hair, he's a wiggin. You're
not going to argue with the OxIord Lnglish Lictionary,
are you: By definition, a greeneyed blackhaired person is
a wiggin.
"But you called him a wiggin," said Lrda. That's a
nasty thing to say about someone you don't even know.
You've got no evidence that he puts too much ketchup
on his burgers, or that as a kid he used his slingshot to
launch baby squirrels."
But he is a wiggin," Lanny said patiently. He's got
green eyes and black hair, right: ]ust you watch, as soon
as his burger arrives, he's reaching Ior the ketchup."
The human mind passes Irom observed characteristics to in
Ierred characteristics via the medium oI words. In All humans are
mortal, Socrates is a human, thereIore Socrates is mortal
, the ob
served characteristics are Socrates's clothes, speech, tool use, and
. http://lesswrong.com/lw/ny/sneaking_in_connotations/
z. Page z,;, 'Categorizing Has Consequences'.
generally human shape, the categorization is human", the inIerred
characteristic is poisonability by hemlock.
OI course there's no hard distinction between observed charac
teristics" and inIerred characteristics". II you hear someone speak,
they're probably shaped like a human, all else being equal. II you
see a human Iigure in the shadows, then ceteris paribus it can prob
ably speak.
And yet some properties do tend to be more inIerred than ob
served. You're more likely to decide that someone is human, and
will thereIore burn iI exposed to open Ilame, than carry through the
inIerence the other way around.
II you look in a dictionary Ior the deIinition oI human", you're
more likely to Iind characteristics like intelligence" and Ieather
less biped"characteristics that are useIul Ior quickly eyeballing
what is and isn't a humanrather than the ten thousand conno
tations, Irom vulnerability to hemlock, to overconIidence, that we
can inIer Irom someone's being human. \hy: Perhaps dictionar
ies are intended to let you match up labels to similarity groups, and
so are designed to quickly isolate clusters in thingspace. Or per
haps the big, distinguishing characteristics are the most salient, and
thereIore Iirst to pop into a dictionary editor's mind. (I'm not sure
how aware dictionary editors are oI what they really do.)
But the upshot is that when Lanny pulls out his OLL to look
up wiggin", he sees listed only the Iirstglance characteristics that
distinguish a wiggin: Creen eyes and black hair. The OLL doesn't
list the many minor connotations that have come to attach to this
term, such as criminal proclivities, culinary peculiarities, and some
unIortunate childhood activities.
How did those connotations get there in the Iirst place: Maybe
there was once a Iamous wiggin with those properties. Or maybe
someone made stuII up at random, and wrote a series oI bestselling
books about it (The Wiggin, Talking to Wiggins, Raising Your Little
Wiggin, Wiggins in the Bedroom). Maybe even the wiggins believe it
now, and act accordingly. As soon as you call some people wig
gins", the word will begin acquiring connotations.
But remember the Parable oI Hemlock
: II we go by the logical
class deIinitions, we can never class Socrates as a human" until
SNEAKING IN CONNOTATIONS 261
aIter we observe him to be mortal. \henever someone pulls a dic
tionary, they're generally trying to sneak in a connotation, not the
actual deIinition written down in the dictionary.
AIter all, iI the only meaning oI the word wiggin" is greeneyed
blackhaired person", then why not just call those people green
eyed blackhaired people": And iI you're wondering whether some
one is a ketchupreacher, why not ask
,
directly
6
, Is he a ketchup
reacher:" rather than Is he a wiggin:" (ote substitution oI sub
stance Ior symbol
;
.)
Oh, but arguing the real question would require work. You'd
have to actually watch the wiggin to see iI he reached Ior the
ketchup. Or maybe see iI you can Iind statistics on how many
greeneyed blackhaired people actually like ketchup. At any rate,
you wouldn't be able to do it sitting in your living room with your
eyes closed. And people are lazy. They'd rather argue by deIini
tion", especially since they think you can deIine a word any way
you like".
But oI course the real reason they care whether someone is
a wiggin" is a connotationa Ieeling that comes along with the
wordthat isn't in the deIinition they claim to use.
Imagine Lanny saying, Look, he's got green eyes and black
hair. He's a wiggin! It says so right there in the dictionary!there-
fore, he's got black hair. Argue with that, iI you can!"
Loesn't have much oI a triumphant ring to it, does it: II the
real point oI the argument actually was contained in the dictionary
deIinitioniI the argument genuinely was logically validthen the
argument would feel empty, it would either say nothing new, or beg
the question.
It's only the attempt to smuggle in connotations not explicitly
listed in the deIinition, that makes anyone Ieel they can score a point
that way.
,. Page j;, '\ords as Hidden InIerences'.
6. Page , 'Hug the Query'.
;. Page z8, 'Replace the Symbol with the Substance'.
20. Arguing By Definition
Followup to: Sneaking in Connotations

z
This plucked chicken has two legs and no IeathersthereIore,
by definition, it is a human!"
\hen people argue deIinitions, they usually start with some vis
ible, known, or at least widely believed set oI characteristics, then
pull out a dictionary, and point out that these characteristics Iit
the dictionary deIinition, and so conclude, ThereIore, by definition,
atheism is a religion!"
But visible, known, widely believed characteristics are rarely
the real
point
oI a dispute. ]ust the Iact that someone thinks

Socrates's two legs are evident enough to make a good premise Ior
the argument, ThereIore, by definition, Socrates is human!" indi
cates that bipedalism probably isn't really what's at stakeor the
listener would reply, \haddaya mean Socrates is bipedal: That's
what we're arguing about in the Iirst place!"
ow there is an important sense in which we can legitimately
move Irom evident characteristics to notsoevident ones. You can,
legitimately, see that Socrates is humanshaped, and predict his vul
nerability to hemlock. But this probabilistic inIerence does not rely
on dictionary deIinitions or common usage, it relies on the universe
containing empirical clusters
,
oI similar things
6
.
This cluster structure is not going to change depending on how
you deIine your words. Lven iI you look up the dictionary deIini
tion oI human" and it says all Ieatherless bipeds except Socrates",
that isn't going to change the actual degree to which Socrates is sim
ilar to the rest oI us Ieatherless bipeds.
\hen you are arguing correctly Irom cluster structure, you'll say
something like, Socrates has two arms, two Ieet, a nose and tongue,
speaks Iluent Creek, uses tools, and in every aspect I've been able
to observe him, seems to have every major and minor property that
characterizes Homo sapiens; so I'm going to guess that he has human
. http://lesswrong.com/lw/nz/arguing_by_definition/
z. Page z6o, 'Sneaking in Connotations'.
. Page z6o, 'Sneaking in Connotations'.
,. Page zoj, 'The Cluster Structure oI Thingspace'.
6. Page zo, 'Similarity Clusters'.
LA, human biochemistry, and is vulnerable to hemlock just like
all other Homo sapiens in whom hemlock has been clinically tested
Ior lethality."
And suppose I reply, But I saw Socrates out in the Iields with
some herbologists, I think they were trying to prepare an antidote.
ThereIore I dont expect Socrates to keel over aIter he drinks the
hemlockhe will be an exception to the general behavior oI objects
in his cluster: they did not take an antidote, and he did."
ow there's not much point in arguing over whether Socrates is
human" or not. The conversation has to move to a more detailed
level
;
, poke around inside the details that make up the human" cat
egorytalk about human biochemistry, and speciIically, the neuro
toxic eIIects oI coniine.
II you go on insisting, But Socrates is a human and humans, by
definition, are mortal!" then what you're really trying to do is blur
out everything you know about Socrates except the Iact oI his hu
manityinsist that the only correct prediction is the one you would
make iI you knew nothing about Socrates except that he was human.
\hich is like insisting that a coin is ,o% likely to be showing
heads or tails, because it is a Iair coin", aIter you've actually looked at
the coin and it's showing heads. It's like insisting that Frodo has ten
Iingers, because most hobbits have ten Iingers, aIter you've already
looked at his hands and seen nine Iingers. aturally this is illegal un
der Bayesian probability theory: You can't just reIuse to condition
on new evidence.
And you can't just keep one categorization and make estimates
based on that, while deliberately throwing out everything else you
know.
ot every piece oI new evidence makes a signiIicant diIIerence,
oI course. II I see that Socrates has nine Iingers, this isn't going to
noticeably change my estimate oI his vulnerability to hemlock, be
cause I'll expect that the way Socrates lost his Iinger didn't change
the rest oI his biochemistry. And this is true, whether or not the dic
tionary's deIinition says that human beings have ten Iingers. The
legal inIerence is based on the cluster structure oI the environment,
and the causal structure oI biology, not what the dictionary editor
writes down, nor even common usage".
;. Page z8, 'Replace the Symbol with the Substance'.
ow ordinarily, when you're doing this rightin a legitimate
wayyou just say, The coniine alkaloid Iound in hemlock pro
duces muscular paralysis in humans, resulting in death by asphyx
iation." Or more simply, Humans are vulnerable to hemlock."
That's how it's usually said in a legitimate argument.
\hen would someone Ieel the need to strengthen the argument
with the emphatic phrase by deIinition": (I.e. Humans are vul
nerable to hemlock by definition!) \hy, when the inIerred charac
teristic has been called into doubtSocrates has been seen consult
ing herbologistsand so the speaker Ieels the need to tighten the
vise oI logic.
So when you see by deIinition" used like this, it usually means:
Forget what you've heard about Socrates consulting herbolo
gistshumans, by definition, are mortal!"
People Ieel the need to squeeze the argument onto a single
course by saying Any P, by deIinition, has property Q!", on exactly
those occasions when they see, and preIer to dismiss out oI hand,
additional arguments that call into doubt the deIault inIerence based
on clustering.
So too with the argument X, by definition, is a Y!" L.g., Athe
ists believe that Cod doesn't exist, thereIore atheists have belieIs
about Cod, because a negative belieI is still a belieI, thereIore athe
ism asserts answers to theological questions, thereIore atheism is,
by definition, a religion."
You wouldn't Ieel the need to say, Hinduism, by definition, is a
religion!" because, well, oI course Hinduism is a religion. It's not
just a religion by deIinition", it's, like, an actual religion.
Atheism does not resemble the central members oI the reli
gion" cluster, so iI it wasn't Ior the Iact that atheism is a religion by
definition, you might go around thinking that atheism wasnt a reli
gion. That's why you've got to crush all opposition by pointing out
that Atheism is a religion" is true by definition, because it isn't true
any other way.
\hich is to say: People insist that X, by definition, is a Y!" on
those occasions when they're trying to sneak in a connotation
8
oI Y
that isn't directly in the deIinition, and X doesn't look all that much
like other members oI the Y cluster.
8. Page z6o, 'Sneaking in Connotations'.
ARGUING 'BY DEFINITION 265
Over the last thirteen years I've been keeping track oI how
oIten this phrase is used correctly versus incorrectlythough not
with literal statistics, I Iear. But eyeballing suggests that using the
phrase by definition, anywhere outside oI math, is among the most
alarming signals oI Ilawed argument I've ever Iound. It's right up
there with Hitler
j
, Cod", absolutely certain" and can't prove
that".
This heuristic oI Iailure is not perIectthe Iirst time I ever
spotted a correct usage outside oI math, it was by Richard Feyn
man, and since then I've spotted more. But you're probably better
oII just deleting the phrase by deIinition" Irom your vocabu
laryand always on any occasion where you might be tempted to
say it in italics or Iollowed with an exclamation mark. That's a bad
idea by definition!
j. http://www.catb.org/jargon/html/G/Godwins-Law.html
21. Where to Draw the Boundary?
Followup to: Arguing By LeIinition"

z
The one comes to you and says
:
Long have I pondered the meaning oI the word Art",
and at last I've Iound what seems to me a satisIactory
deIinition: Art is that which is designed Ior the purpose
oI creating a reaction in an audience."
Just because theres a word art doesnt mean that it has a meaning,
floating out there in the void, which you can discover by finding the right
definition.
It Ieels that way
, but it is not so.

\ondering how to define a word means you're looking at the
problem the wrong waysearching Ior the mysterious essence oI
what is, in Iact, a communication signal
,
.
ow, there is a real challenge which a rationalist may legitimate
ly attack, but the challenge is not to Iind a satisIactory deIinition
oI a word. The real challenge can be played as a singleplayer game,
without speaking aloud. The challenge is Iiguring out which things
are similar to each otherwhich things are clustered togetherand
sometimes, which things have a common cause.
II you deIine eluctromugnetism" to include lightning, include
compasses, exclude light, and include Mesmer's animal mag
netism" (what we now call hypnosis), then you will have some trou
ble asking How does electromugnetism work:" You have lumped
together things which do not belong together, and excluded others
that would be needed to complete a set. (This example is histori
cally plausible, Mesmer came beIore Faraday.)
\e could say that electromugnetism is a wrong word, a bound
ary in thingspace
6
that loops around and swerves through the clus
ters, a cut that Iails to carve reality along its natural joints.
. http://lesswrong.com/lw/o0/where_to_draw_the_boundary/
z. Page z6, 'Arguing By LeIinition"'.
. http://lesswrong.com/lw/nw/fallacies_of_compression/ia9
. Page z, 'Feel the Meaning'.
,. Page z8, 'The Argument Irom Common \sage'.
6. Page zoj, 'The Cluster Structure oI Thingspace'.
Figuring where to cut reality in order to carve along the
jointsthis is the problem worthy oI a rationalist. It is what people
should be trying to do, when they set out in search oI the Iloating
essence oI a word.
And make no mistake: it is a scientific challenge to realize that
you need a single word to describe breathing and Iire
;
. So do not
think to consult the dictionary editors, Ior that is not their job.
\hat is art": But there is no essence oI the word, Iloating in
the void.
Perhaps you come to me with a long list oI the things that you
call art" and not art":
The Little Fugue in G Minor: Art.
A punch in the nose: ot art.
Lscher's Relativity: Art.
A Ilower: ot art.
The Python programming language: Art.
A cross Iloating in urine: ot art.
]ack Vance's Tschai novels: Art.
Modern Art: ot art.
And you say to me: It Ieels intuitive to me to draw this bound
ary, but I don't know whycan you Iind me an intension that
matches this extension: Can you give me a simple description oI
this boundary:"
So I reply: I think it has to do with admiration oI craItsman
ship: work going in and wonder coming out. \hat the included
items have in common is the similar aesthetic emotions that they
inspire, and the deliberate human eIIort that went into them with
the intent oI producing such an emotion."
Is this helpIul, or is it just cheating
8
at Taboo
j
: I would argue
that the list oI which human emotions are or are not aesthetic is Iar
more compact than the list oI everything that is or isn't art. You
might be able to see those emotions lighting up an IMRI scanI
say this by way oI emphasizing that emotions are not ethereal.
;. http://lesswrong.com/lw/hq/universal_fire/
8. Page z8, 'Replace the Symbol with the Substance'.
j. Page z,, 'Taboo Your \ords'.
But oI course my deIinition oI art is not the real point. The real
point is that you could well dispute either the intension or the ex
tension
o
oI my deIinition.
You could say, Aesthetic emotion is not what these things have
in common, what they have in common is an intent to inspire
any complex emotion Ior the sake oI inspiring it." That would be
disputing my intension, my attempt to draw a curve through the
data points. You would say, Your equation may roughly Iit those
points, but it is not the true generating distribution."
Or you could dispute my extension by saying, Some oI these
things do belong togetherI can see what you're getting atbut
the Python language shouldn't be on the list, and Modern Art
should be." (This would mark you as a gullible philistine, but you
could argue it.) Here, the presumption is that there is indeed an
underlying curve that generates this apparent list oI similar and dis
similar thingsthat there is a rhyme and reason, even though you
havent said yet where it comes frombut I have unwittingly lost the
rhythm and included some data points Irom a diIIerent generator.
Long beIore you know what it is that electricity and magnetism
have in common, you might still suspectbased on surIace appear
ancesthat animal magnetism" does not belong on the list.
Once upon a time it was thought that the word Iish" included
dolphins. ow you could play the ohsoclever arguer, and say,
The list: Salmon, guppies, sharks, dolphins, troutl is just a
listyou can't say that a list is wrong. I can prove in set theory that
this list exists. So my deIinition oI fish, which is simply this exten
sional list, cannot possibly be 'wrong' as you claim."
Or you could stop playing nitwit games and admit that dolphins
don't belong on the Iish list.
You come up with a list oI things that feel similar, and take a
guess at why this is so. But when you Iinally discover what they re-
ally have in common, it may turn out that your guess was wrong. It
may even turn out that your list was wrong.
You cannot hide behind a comIorting shield oI correctbydeIi
nition. Both extensional deIinitions and intensional deIinitions can
be wrong, can Iail to carve reality at the joints.
o. Page zoo, 'Lxtensions and Intensions'.
WHERE TO DRAW THE BOUNDARY? 269
Categorizing is a guessing endeavor, in which you can make mis
takes, so it's wise to be able to admit, Irom a theoretical standpoint,
that your deIinitionguesses can be mistaken".
22. Entropy, and Short Codes
Followup to: \here to Lraw the Boundary:

z
Suppose you have a system X that's equally likely to be in any oI
8 possible states:
X
, X
z
, X
, X
, X
,
, X
6
, X
;
, X
8
.l
There's an extraordinarily ubiquitous quantityin physics,
mathematics, and even biologycalled entropy, and the entropy oI
X is bits. This means that, on average, we'll have to ask yesor
no questions to Iind out X's value. For example, someone could tell
us X's value using this code:
X
: oo X
z
: oo X
: o X
: oo
X
,
: o X
6
: o X
;
: X
8
: ooo
So iI I asked Is the Iirst symbol :" and heard yes", then asked
Is the second symbol :" and heard no", then asked Is the third
symbol :" and heard no", I would know that X was in state .
ow suppose that the system Y has Iour possible states with
the Iollowing probabilities:
Y
: /z
(,o%)
Y
z
: /
(z,%)
Y
: /8
(z.,%)
Y
: /8
(z.,%)
Then the entropy oI Y would be .;, bits, meaning that we can
Iind out its value by asking .;, yesorno questions.
\hat does it mean to talk about asking one and threeIourths
oI a question: Imagine that we designate the states oI Y using the
Iollowing code:
Y
: Y
z
: o Y
: oo Y
: ooo
. http://lesswrong.com/lw/o1/entropy_and_short_codes/
z. Page z6;, '\here to Lraw the Boundary:'.
First you ask, Is the Iirst symbol :" II the answer is yes",
you're done: Y is in state . This happens halI the time, so ,o% oI
the time, it takes yesorno question to Iind out Y's state.
Suppose that instead the answer is o". Then you ask, Is the
second symbol :" II the answer is yes", you're done: Y is in state
z. Y is in state z with probability /, and each time Y is in state z
we discover this Iact using two yesorno questions, so z,% oI the
time it takes z questions to discover Y's state.
II the answer is o" twice in a row, you ask Is the third sym
bol :" II yes", you're done and Y is in state , iI no", you're done
and Y is in state . The /8 oI the time that Y is in state , it takes
three questions, and the /8 oI the time that Y is in state , it takes
three questions.
(/z * ) - (/ * z) - (/8 * ) - (/8 * )
o., - o., - o.;, - o.;,
.;,.
The general Iormula Ior the entropy oI a system S is the sum,
over all S
i
, oI p(S
i
)*log
z
(p(S
i
)).
For example, the log (base z) oI /8 is . So (/8 * ) o.;, is
the contribution oI state S
to the total entropy: /8 oI the time, we

have to ask questions.
You can't always devise a perIect code Ior a system, but iI you
have to tell someone the state oI arbitrarily many copies oI S in
a single message, you can get arbitrarily close to a perIect code.
(Coogle arithmetic coding" Ior a simple method.)
ow, you might ask: \hy not use the code o Ior Y
, instead
oI ooo: \ouldn't that let us transmit messages more quickly:"
But iI you use the code o Ior Y
, then when someone answers

Yes" to the question Is the Iirst symbol :", you won't know yet
whether the system state is Y
() or Y
(o). In Iact, iI you change

the code this way, the whole system Ialls apartbecause iI you hear
oo", you don't know iI it means Y
, Iollowed by Y
z
" or Y
, Iol
lowed by Y
."
The moral is that short words are a conserved resource.
The key to creating a good codea code that transmits mes
sages as compactly as possibleis to reserve short words Ior things
that you'll need to say Irequently, and use longer words Ior things
that you won't need to say as oIten.
\hen you take this art to its limit, the length oI the message you
need to describe something, corresponds exactly or almost exact
ly to its probability. This is the Minimum Lescription Length or
Minimum Message Length Iormalization oI Occam's Razor
.
And so even the labels that we use Ior words are not quite ar
bitrary. The sounds that we attach to our concepts can be better
or worse, wiser or more Ioolish. Lven apart Irom considerations oI
common usage
!
I say all this, because the idea that You can X any way you like"
is a huge obstacle to learning how to X wisely. It's a Iree coun
try, I have a right to my own opinion
,
" obstructs the art oI Iinding
truth. I can deIine a word any way I like" obstructs the art oI carv
ing reality at its joints
6
. And even the sensiblesounding The labels
we attach to words are arbitrary" obstructs awareness oI compact
ness. Prosody too, Ior that matterTolkien once observed what a
beautiIul sound the phrase cellar door" makes, that is the kind oI
awareness it takes to use language like Tolkien.
The length oI words also plays a nontrivial role in the cognitive
science oI language:
Consider the phrases recliner", chair", and Iurniture". Re
cliner is a more speciIic category than chair, Iurniture is a more
general category than chair. But the vast majority oI chairs have a
common useyou use the same sort oI motor actions to sit down
in them, and you sit down in them Ior the same sort oI purpose (to
take your weight oII your Ieet while you eat, or read, or type, or
rest). Recliners do not depart Irom this theme. Furniture", on the
other hand, includes things like beds and tables which have diIIer
ent uses, and call up diIIerent motor Iunctions, Irom chairs.
In the terminology oI cognitive psychology, chair" is a basic-lev-
el category.
People have a tendency to talk, and presumably think, at the ba
sic level oI categorizationto draw the boundary around chairs",
. Page j,, 'Occam's Razor'.
. Page z8, 'The Argument Irom Common \sage'.
,. http://www.overcomingbias.com/2006/12/you_are_never_e.html
6. Page z6;, '\here to Lraw the Boundary:'.
ENTROPY, AND SHORT CODES 273
rather than around the more speciIic category recliner", or the
more general category Iurniture". People are more likely to say
You can sit in that chair" than You can sit in that recliner" or
You can sit in that Iurniture".
And it is no coincidence that the word Ior chair" contains
Iewer syllables than either recliner" or Iurniture". Basiclevel cat
egories, in general, tend to have short names, and nouns with short
names tend to reIer to basiclevel categories. ot a perIect rule, oI
course, but a deIinite tendency. Frequent use goes along with short
words, short words go along with Irequent use.
Or as Louglas HoIstadter put it, there's a reason why the Ln
glish language uses the" to mean the" and antidisestablishmentar
ianism" to mean antidisestablishmentarianism" instead oI antidis
establishmentarianism other way around.
23. Mutual Information, and Density in
Thingspace
Continuation of: Lntropy, and Short Codes

z
Suppose you have a system X that can be in any oI 8 states,
which are all equally probable (relative to your current state oI
knowledge), and a system Y that can be in any oI states, all equally
probable.
The entropy oI X, as deIined yesterday, is bits, we'll need to
ask yesorno questions to Iind out X's exact state. The entropy
oI Y, as deIined yesterday, is z bits, we have to ask z yesorno ques
tions to Iind out Y's exact state. This may seem obvious since z
8 and z
z
, so questions can distinguish 8 possibilities and z
questions can distinguish possibilities, but remember that iI the
possibilities were not all equally likely, we could use a more clever
code to discover Y's state using e.g. .;, questions on average. In
this case, though, X's probability mass is evenly distributed over all its
possible states, and likewise Y, so we can't use any clever codes.
\hat is the entropy oI the combined system (X,Y):
You might be tempted to answer, It takes questions to Iind
out X, and then z questions to Iind out Y, so it takes , questions
total to Iind out the state oI X and Y."
But what iI the two variables are entangled, so that learning the
state oI Y tells us something about the state oI X:
In particular, let's suppose that X and Y are either both odd, or
both even.
ow iI we receive a bit message (ask questions) and learn
that X is in state ,, we know that Y is in state or state , but not
state z or state . So the single additional question Is Y in state
:", answered o", tells us the entire state oI (X,Y): XX
,
, YY
.
And we learned this with a total oI questions.
Conversely, iI we learn that Y is in state using two questions,
it will take us only an additional two questions to learn whether X
is in state z, , 6, or 8. Again, Iour questions to learn the state oI
the joint system.
. http://lesswrong.com/lw/o2/mutual_information_and_density_in_thingspace/
z. Page z;, 'Lntropy, and Short Codes'.
The mutual information oI two variables is deIined as the diIIer
ence between the entropy oI the joint system and the entropy oI
the independent systems: I(X,Y) H(X) - H(Y) H(X,Y).
Here there is one bit oI mutual inIormation between the two
systems: Learning X tells us one bit oI inIormation about Y (cuts
down the space oI possibilities Irom to z, a IactoroIz decrease in
the volume) and learning Y tells us one bit oI inIormation about X
(cuts down the possibility space Irom 8 to ).
\hat about when probability mass is not evenly distributed:
Yesterday, Ior example, we discussed the case in which Y had the
probabilities /z, /, /8, /8 Ior its Iour states. Let us take this to
be our probability distribution over Y, considered independently
iI we saw Y, without seeing anything else, this is what we'd expect
to see. And suppose the variable Z has two states, and z, with
probabilities /8 and ,/8 respectively.
Then iI and only iI the joint distribution oI Y and Z is as Iollows,
there is zero mutual inIormation between Y and Z:
Z
: /6 Z
Y
z
: /z Z
: /6 Z
: /6
Z
z
Y
: ,/6 Z
z
Y
z
: ,/z Z
z
Y
: ,/6 Z
z
Y
: ,/6
This distribution obeys the law:
p(Y,Z) P(Y)P(Z)
For example, P(Z
Y
z
) P(Z
)P(Y
z
) /8 * / /z.
And observe that we can recover the marginal (independent)
probabilities oI Y and Z just by looking at the joint distribution:
P(Y
) total probability oI all the diIIerent ways Y
can
happen
P(Z
) - P(Z
z
Y
)
/6 - ,/6
/z.
So, just by inspecting the joint distribution, we can determine
whether the marginal variables Y and Z are independent, that is,
whether the joint distribution Iactors into the product oI the
marginal distributions, whether, Ior all Y and Z, P(Y,Z) P(Y)P(Z).
This last is signiIicant because, by Bayes's Rule
:
P(Y
i
,Z
j
) P(Y
i
)P(Z
j
)
P(Y
i
,Z
j
)/P(Z
j
) P(Y
i
)
P(Y
i
'Z
j
) P(Y
i
)
In Lnglish, AIter you learn Z
j
, your belieI about Y
i
is just what
it was beIore."
So when the distribution Iactorizes when P(Y,Z) P(Y)P(Z)
this is equivalent to Learning about Y never tells us anything about
Z or vice versa."
From which you might suspect, correctly, that there is no mu
tual inIormation between Y and Z. \here there is no mutual
inIormation, there is no Bayesian evidence, and vice versa.
Suppose that in the distribution YZ above, we treated each
possible combination oI Y and Z as a separate eventso that
the distribution YZ would have a total oI 8 possibilities, with the
probabilities shownand then we calculated the entropy oI the dis
tribution YZ the same way we would calculate the entropy oI any
distribution:
/6 log
z
(/6) - /z log
z
(/z) - /6 log
z
(/6) - . - ,/6
log
z
(,/6)
You would end up with the same total you would get iI you sep
arately calculated the entropy oI Y plus the entropy oI Z. There is
no mutual inIormation between the two variables, so our uncertain
ty about the joint system is not any less than our uncertainty about
the two systems considered separately. (I am not showing the cal
culations, but you are welcome to do them, and I am not showing
the prooI that this is true in general, but you are welcome to Coogle
on Shannon entropy" and mutual inIormation".)
\hat iI the joint distribution doesn't Iactorize: For example:
Z
: z/6 Z
Y
z
: 8/6 Z
: /6 Z
: /6
MUTUAL INFORMATION, AND DENSITY IN THINGSPACE 277
Z
z
Y
: zo/6 Z
z
Y
z
: 8/6 Z
z
Y
: ;/6 Z
z
Y
: ,/6
II you add up the joint probabilities to get marginal probabil
ities, you should Iind that P(Y
) /z, P(Z
) /8, and so on the

marginal probabilities are the same as beIore.
But the joint probabilities do not always equal the product oI
the marginal probabilities. For example, the probability P(Z
Y
z
)
equals 8/6, where P(Z
)P(Y
z
) would equal /8 * / 6/6. That is,
the probability oI running into Z
Y
z
together, is greater than you'd
expect based on the probabilities oI running into Z
or Y
z
separate
ly.
\hich in turn implies:
P(Z
Y
z
) > P(Z
)P(Y
z
)
P(Z
Y
z
)/P(Y
z
) > P(Z
)
P(Z
'Y
z
) > P(Z
)
Since there's an unusually high" probability Ior P(Z
Y
z
) de
Iined as a probability higher than the marginal probabilities would
indicate by deIault it Iollows that observing Y
z
is evidence which
increases the probability oI Z
. And by a symmetrical argument,

observing Z
must Iavor Y
z
.
As there are at least some values oI Y that tell us about Z
(and vice versa) there must be mutual inIormation between the two
variables, and so you will IindI am conIident, though I haven't ac
tually checkedthat calculating the entropy oI YZ yields less total
uncertainty than the sum oI the independent entropies oI Y and Z.
H(Y,Z) H(Y) - H(Z) I(Y,Z) with all quantities necessarily non
negative.
(I digress here to remark that the symmetry oI the
expression Ior the mutual inIormation shows that Y must
tell us as much about Z, on average, as Z tells us about
Y. I leave it as an exercise to the reader to reconcile this
with anything they were taught in logic class about how,
iI all ravens are black, being allowed to reason Raven(x)
>Black(x) doesn't mean you're allowed to reason Black(x)
>Raven(x). How diIIerent seem the symmetrical
probability Ilows oI the Bayesian, Irom the sharp lurches
oI logiceven though the latter is just a degenerate case
oI the Iormer.)
But," you ask, what has all this to do with the proper use oI
words:"
In Lmpty Labels
and then Replace the Symbol with the Sub

stance
,
, we saw the technique oI replacing a word with its deIinition
the example being given:
All |mortal, -Ieathers, bipedal] are mortal.
Socrates is a |mortal, -Ieathers, bipedal].
ThereIore, Socrates is mortal.
\hy, then, would you even want to have a word Ior human":
\hy not just say Socrates is a mortal Ieatherless biped":
Because it's helpIul to have shorter words Ior things that you
encounter oIten. II your code Ior describing single properties is
already eIIicient, then there will not be an advantage to having a
special word Ior a conjunction like human" Ior mortal Ieatherless
biped" unless things that are mortal and Ieatherless and bipedal,
are Iound more often than the marginal probabilities would lead you
to expect.
In eIIicient codes, word length corresponds to probabilityso
the code Ior Z
Y
z
will be just as long as the code Ior Z
plus the
code Ior Y
z
, unless P(Z
Y
z
) > P(Z
)P(Y
z
), in which case the code Ior
the word can be shorter than the codes Ior its parts.
And this in turn corresponds exactly to the case where we can
inIer some oI the properties oI the thing, Irom seeing its other
properties. It must be more likely than the deIault that Ieatherless
bipedal things will also be mortal.
OI course the word human" really describes many, many more
properties when you see a humanshaped entity that talks and
wears clothes, you can inIer whole hosts oI biochemical and
anatomical and cognitive Iacts about it. To replace the word hu
man" with a description oI everything we know about humans
would require us to spend an inordinate amount oI time talking.
. Page zz, 'Lmpty Labels'.
,. Page z8, 'Replace the Symbol with the Substance'.
But this is true only because a Ieatherless talking biped is Iar more
likely than deIault to be poisonable by hemlock, or have broad nails,
or be overconIident.
Having a word Ior a thing, rather than just listing its properties,
is a more compact code precisely in those cases where we can inIer
some oI those properties Irom the other properties. (\ith the ex
ception perhaps oI very primitive words, like red", that we would
use to send an entirely uncompressed description oI our sensory ex
periences. But by the time you encounter a bug, or even a rock,
you're dealing with nonsimple property collections, Iar above the
primitive level.)
So having a word wiggin
6
" Ior greeneyed blackhaired people,
is more useIul than just saying greeneyed blackhaired person",
precisely when:
. Creeneyed people are more likely than average to be
blackhaired (and vice versa), meaning that we can
probabilistically inIer green eyes Irom black hair or vice
versa, or
z. \iggins share other properties that can be inIerred at
greaterthandeIault probability. In this case we have to
separately observe the green eyes and black hair, but then,
aIter observing both these properties independently, we
can probabilistically inIer other properties (like a taste Ior
ketchup).
One may even consider the act oI deIining a word as a promise
to this eIIect. Telling someone, I deIine the word 'wiggin' to mean
a person with green eyes and black hair", by Cricean implication,
asserts that the word wiggin" will somehow help you make inIer
ences / shorten your messages.
II greeneyes and black hair have no greater than deIault prob
ability to be Iound together, nor does any other property occur at
greater than deIault probability along with them, then the word
wiggin" is a lie: The word claims that certain people are worth dis
tinguishing as a group, but they're not.
In this case the word wiggin" does not help describe reality
more compactlyit is not deIined by someone sending the shortest
messageit has no role in the simplest explanation. Lquivalently,
6. Page z6o, 'Sneaking in Connotations'.
the word wiggin" will be oI no help to you in doing any Bayesian
inIerence. Lven iI you do not call the word a lie, it is surely an error.
And the way to carve reality at its joints, is to draw your bound
aries around concentrations oI unusually high probability density in
Thingspace
;
.
;. Page zoj, 'The Cluster Structure oI Thingspace'.
24. Superexponential Conceptspace, and
Simple Words
Followup to: Mutual InIormation, and Lensity in Thingspace

z
Thingspace
, you might think, is a rather huge space. Much

larger than reality, Ior where reality only contains things that actu
ally exist, Thingspace contains everything that could exist.
Actually, the way I deIined" Thingspace to have dimensions Ior
every possible attributeincluding correlated attributes like densi
ty and volume and massThingspace may be too poorly deIined to
have anything you could call a size. But it's important to be able to
visualize Thingspace anyway. Surely, no one can really understand a
Ilock oI sparrows iI all they see is a cloud oI Ilapping cawing things,
rather than a cluster oI points in Thingspace.
But as vast as Thingspace may be, it doesn't hold a candle to the
size oI Conceptspace.
Concept", in machine learning, means a rule that includes or
excludes examples. II you see the data z:-, :, :-, z:, 8:-, j: then
you might guess that the concept was even numbers". There is a
rather large literature (as one might expect) on how to learn con
cepts Irom data. given random examples, given chosen examples.
given possible errors in classiIication. and most importantly, given
diIIerent spaces oI possible rules.
Suppose, Ior example, that we want to learn the concept good
days on which to play tennis". The possible attributes oI Lays are:
Sky: {Sunny, Cloudy, Rainy}
AirTemp: {Warm, Cold}
Humidity: {Normal, High}
Wind: {Strong, Weak}
\e're then presented with the Iollowing data, where - indicates
a positive example oI the concept, and indicates a negative classi
Iication:
. http://lesswrong.com/lw/o3/superexponential_conceptspace_and_simple_words/
z. Page z;,, 'Mutual InIormation, and Lensity in Thingspace'.
+ Sky: Sunny; AirTemp: Warm;
Humidity: High; Wind: Strong.
- Sky: Rainy; AirTemp: Cold;
Humidity: High; Wind: Strong.
+ Sky: Sunny; AirTemp: Warm;
Humidity: High; Wind: Weak.
\hat should an algorithm inIer Irom this:
A machine learner might represent one concept that Iits this da
ta as Iollows:
Sky: ?; AirTemp: Warm; Humidity:
High; Wind: ?
In this Iormat, to determine whether this concept accepts or
rejects an example, we compare elementbyelement: : accepts any
thing, but a speciIic value accepts only that speciIic value.
So the concept above will accept only Lays with AirTem
p\arm and HumidityHigh, but the Sky and the \ind can take
on any value. This Iits both the negative and the positive classiIica
tions in the data so Iarthough it isn't the only concept that does
so.
\e can also simpliIy the above concept representation to :,
\arm, High, :l.
\ithout going into details, the classic algorithm would be:
Maintain the set oI the most general hypotheses that Iit
the datathose that positively classiIy as many examples
as possible, while still Iitting the Iacts.
Maintain another set oI the most speciIic hypotheses that
Iit the datathose that negatively classiIy as many
examples as possible, while still Iitting the Iacts.
Lach time we see a new negative example, we strengthen
all the most general hypotheses as little as possible, so that
the new set is again as general as possible while Iitting the
Iacts.
Lach time we see a new positive example, we relax all the
most speciIic hypotheses as little as possible, so that the
SUPEREXPONENTIAL CONCEPTSPACE, AND SIMPLE WORDS 283
new set is again as speciIic as possible while Iitting the
Iacts.
\e continue until we have only a single hypothesis leIt.
This will be the answer if the target concept was in our
hypothesis space at all.
In the case above, the set oI most general hypotheses would be
:, \arm, :, :l and Sunny, :, :, :l, while the set oI most speciIic hy
potheses is the single member Sunny, \arm, High, :l.
Any other concept you can Iind that Iits the data will be strictly
more speciIic than one oI the most general hypotheses, and strictly
more general than the most speciIic hypothesis.
(For more on this, I recommend Tom Mitchell's Machine Learn-
ing, Irom which this example was adapted.)
ow you may notice that the Iormat above cannot represent all
possible concepts. L.g. Play tennis when the sky is sunny or the air
is warm". That Iits the data, but in the concept representation de
Iined above, there's no quadruplet oI values that describes the rule.
Clearly our machine learner is not very general. \hy not allow
it to represent all possible concepts, so that it can learn with the
greatest possible Ilexibility:
Lays are composed oI these Iour variables, one variable with
values and three variables with z values. So there are *z*z*z z
possible Lays that we could encounter.
The Iormat given Ior representing Concepts allows us to require
any oI these values Ior a variable, or leave the variable open. So
there are *** o8 concepts in that representation. For the
mostgeneral/mostspeciIic algorithm to work, we need to start
with the most speciIic hypothesis no example is ever positively
classiIied". II we add that, it makes a total oI oj concepts.
Is it suspicious that there are more possible concepts than pos
sible Lays: Surely not: AIter all, a concept can be viewed as a
collection oI Lays. A concept can be viewed as the set oI days that it
classiIies positively, or isomorphically, the set oI days that it classi
Iies negatively.
So the space oI all possible concepts that classiIy Lays is the set
oI all possible sets oI Lays, whose size is z
z
6,;;;,z6.
This complete space includes all the concepts we have discussed
so Iar. But it also includes concepts like Positively classiIy only the
examples Sunny, \arm, High, Strongl and Sunny, \arm, High,
\eakl and reject everything else" or egatively classiIy only the
example Rainy, Cold, High, Strongl and accept everything else." It
includes concepts with no compact representation, just a Ilat list oI
what is and isn't allowed.
That's the problem with trying to build a Iully general" in
ductive learner: They can't learn concepts until they've seen every
possible example in the instance space.
II we add on more attributes to Layslike the \ater temper
ature, or the Forecast Ior tomorrowthen the number oI possible
days will grow exponentially in the number oI attributes. But this
isn't a problem with our restricted concept space, because you can
narrow down a large space using a logarithmic number oI examples.
Let's say we add the \ater: \arm, Coldl attribute to days,
which will make Ior 8 possible Lays and z, possible concepts.
Let's say that each Lay we see is, usually, classiIied positive by
around halI oI the currentlyplausible concepts, and classiIied nega
tive by the other halI. Then when we learn the actual classiIication
oI the example, it will cut the space oI compatible concepts in halI.
So it might only take j examples (z
j
,z) to narrow z, possible
concepts down to one.
Lven iI Lays had Iorty binary attributes, it should still only take
a manageable amount oI data to narrow down the possible concepts
to one. 6 examples, iI each example is classiIied positive by halI
the remaining concepts. Assuming, oI course, that the actual rule is
one we can represent at all!
II you want to think oI all the possibilities, well, good luck with
that. The space oI all possible concepts grows superexponentially in
the number oI attributes.
By the time you're talking about data with Iorty binary attrib
utes, the number oI possible examples is past a trillionbut the
number oI possible concepts is past twotothetrillionthpower. To
narrow down that superexponential concept space, you'd have to see
over a trillion examples beIore you could say what was In, and what
was Out. You'd have to see every possible example, in Iact.
That's with Iorty binary attributes, mind you. o bits, or ,
bytes, to be classiIied simply Yes" or o". o bits implies zo
possible examples, and z(zo) possible concepts that classiIy those
examples as positive or negative.
So, here in the real world, where objects take more than , bytes
to describe and a trillion examples are not available and there is
noise in the training data, we only even think about highly regular
concepts. A human mindor the whole observable universeis
not nearly large enough to consider all the other hypotheses.
From this perspective, learning doesn't just rely on inductive
bias
, it is nearly all inductive biaswhen you compare the number

oI concepts ruled out a priori, to those ruled out by mere evidence.
But what has this (you inquire) to do with the proper use oI
words:
It's the whole reason that words have intensions as well as ex
tensions
,
.
In yesterday's post
6
, I concluded:
The way to carve reality at its joints, is to draw
boundaries around concentrations oI unusually high
probability density.
I deliberately leIt out a key qualiIication in that (slightly edited)
statement, because I couldn't explain it until today. A better state
ment would be:
The way to carve reality at its joints, is to draw simple
boundaries around concentrations oI unusually high
probability density in Thingspace.
Otherwise you would just gerrymander Thingspace. You would
create really odd noncontiguous boundaries that collected the ob
served examples, examples that couldn't be described in any shorter
message
;
than your observations themselves, and say: This is what
I've seen beIore, and what I expect to see more oI in the Iuture."
In the real world, nothing above the level oI molecules repeats
itselI exactly. Socrates is shaped a lot like all those other humans
. http://lesswrong.com/lw/hg/inductive_bias/
,. Page zoo, 'Lxtensions and Intensions'.
6. Page z;,, 'Mutual InIormation, and Lensity in Thingspace'.
;. Page z;, 'Lntropy, and Short Codes'.
who were vulnerable to hemlock, but he isn't shaped exactly like
them. So your guess that Socrates is a human" relies on drawing
simple boundaries around the human cluster in Thingspace. Rather
than, Things shaped exactly like |,megabyte shape speciIication ]
and with |lots oI other characteristics], or exactly like |,megabyte
shape speciIication z] and |lots oI other characteristics]", ., are hu
man."
II you don't draw simple boundaries around your experiences,
you can't do inIerence with them. So you try to describe art"
8
with intensional deIinitions like that which is intended to inspire
any complex emotion Ior the sake oI inspiring it", rather than just
pointing at a long list oI things that are, or aren't art.
In Iact, the above statement about how to carve reality at
its joints" is a bit chickenandeggish: You can't assess the density
oI actual observations, until you've already done at least a little
carving. And the probability distribution comes Irom drawing the
boundaries, not the other way aroundiI you already had the prob
ability distribution, you'd have everything necessary Ior inIerence,
so why would you bother drawing boundaries:
And this suggests anotheryes, yet anotherreason to be sus
picious oI the claim that you can deIine a word any way you like".
\hen you consider the superexponential size oI Conceptspace, it
becomes clear that singling out one particular concept Ior consid
eration
j
is an act oI no small audacitynot just Ior us, but Ior any
mind oI bounded computing power.
Presenting us with the word wiggin", deIined as a blackhaired
greeneyed person", without some reason Ior raising this particular
concept to the level oI our deliberate attention, is rather like a detec
tive saying: \ell, I haven't the slightest shred oI support one way
or the other Ior who could've murdered those orphans. not even
an intuition, mind you. but have we considered ]ohn Q. \iIIle
heim oI z orkle Rd as a suspect:"
8. Page z6;, '\here to Lraw the Boundary:'.
j. http://lesswrong.com/lw/jo/einsteins_arrogance/
25. Conditional Independence, and Naive
Bayes
Followup to: Searching Ior BayesStructure

z
Previously I spoke oI mutual inIormation
between X and Y,
I(X,Y), which is the diIIerence between the entropy
oI the joint
probability distribution, H(X,Y) and the entropies oI the marginal
distributions, H(X) - H(Y).
I gave the example oI a variable X, having eight states ..8
which are all equally probable iI we have not yet encountered any
evidence, and a variable Y, with states .., which are all equally
probable iI we have not yet encountered any evidence. Then iI we
calculate the marginal entropies H(X) and H(Y), we will Iind that X
has bits oI entropy, and Y has z bits.
However, we also know that X and Y are both even or both odd,
and this is all we know about the relation between them. So Ior the
joint distribution (X,Y) there are only 6 possible states, all equally
probable, Ior a joint entropy oI bits. This is a bit entropy deIect,
compared to , bits oI entropy iI X and Y were independent. This
entropy deIect is the mutual inIormation the inIormation that X
tells us about Y, or vice versa, so that we are not as uncertain about
one aIter having learned the other.
Suppose, however, that there exists a third variable Z. Z has
two states, even" and odd", perIectly correlated to the evenness or
oddness oI (X,Y). In Iact, we'll suppose that Z is just the question
Are X and Y even or odd:"
II we have no evidence about X and Y, then Z itselI necessarily
has bit oI entropy on the inIormation given. There is bit oI mu
tual inIormation between Z and X, and bit oI mutual inIormation
between Z and Y. And, as previously noted, bit oI mutual in
Iormation between X and Y. So how much entropy Ior the whole
system (X,Y,Z): You might naively expect that
. http://lesswrong.com/lw/o8/conditional_independence_and_naive_bayes/
z. http://lesswrong.com/lw/o7/searching_for_bayesstructure/
. Page z;,, 'Mutual InIormation, and Lensity in Thingspace'.
. Page z;, 'Lntropy, and Short Codes'.
H(X,Y,Z) H(X) - H(Y) - H(Z) I(X,Z) I(Z,Y)
I(X,Y)
but this turns out not to be the case.
The joint system (X,Y,Z) only has 6 possible states since Z is
just the question Are X & Y even or odd:" so H(X,Y,Z) bits.
But iI you calculate the Iormula just given, you get
( - z - )bits bits \ROC!
,
\hy: Because iI you have the mutual inIormation between X
and Z, and the mutual inIormation between Z and Y, that may
include some oI the same mutual inIormation that we'll calculate ex
ists between X and Y. In this case, Ior example, knowing that X
is even tells us that Z is even, and knowing that Z is even tells us
that Y is even, but this is the same inIormation that X would tell
us about Y. \e doublecounted
6
some oI our knowledge, and so
came up with too little entropy.
The correct Iormula is (I believe):
H(X,Y,Z) H(X) - H(Y) - H(Z) I(X,Z) I(Z,Y) I(X,Y
' Z)
Here the last term, I(X,Y ' Z), means, the inIormation that X
tells us about Y, given that we already know Z". In this case, X
doesn't tell us anything about Y, given that we already know Z, so
the term comes out as zero and the equation gives the correct an
swer. There, isn't that nice:
o," you correctly
;
reply, Ior you have not told me how to cal-
culate I(X,Y'Z), only given me a verbal argument that it ought to be
zero."
\e calculate I(X,Y'Z) just the way you would expect. I(X,Y)
H(X) - H(Y) H(X,Y), so:
I(X,Y'Z) H(X'Z) - H(Y'Z) H(X,Y'Z)
,. http://www.youtube.com/watch?v=tRVUOGUmxJI
;. http://yudkowsky.net/bayes/technical.html
CONDITIONAL INDEPENDENCE, AND NAIVE BAYES 289
And now, I suppose, you want to know how to calculate the
conditional entropy: \ell, the original Iormula Ior the entropy is:
H(S) Sum i: p(S
i
)*log
z
(p(S
i
))
II we then learned a new Iact Z
o
, our remaining uncertainty
about S would be:
H(S'Z
o
) Sum i: p(S
i
'Z
o
)*log
z
(p(S
i
'Z
o
))
So iI we're going to learn a new Iact Z, but we don't know which
Z yet, then, on average, we expect to be around this uncertain oI S
aIterward:
H(S'Z) Sum j: (p(Z
j
) * Sum i: p(S
i
'Z
j
)*log
z
(p(S
i
'Z
j
)))
And that's how one calculates conditional entropies, Irom
which, in turn, we can get the conditional mutual inIormation.
There are all sorts oI ancillary theorems here, like:
H(X'Y) H(X,Y) H(Y)
and
iI I(X,Z) o and I(Y,X'Z) o then I(X,Y) o
but I'm not going to go into those.
But," you ask, what does this have to do with the nature oI
words and their hidden Bayesian structure:"
I am just so unspeakably glad that you asked that question, be
cause I was planning to tell you whether you liked it or not. But
Iirst there are a couple more preliminaries.
You will rememberyes, you will rememberthat there is a
duality between mutual inIormation and Bayesian evidence. Mutu
al inIormation is positive iI and only iI the probability oI at least
some joint events P(x, y) does not equal the product oI the prob
abilities oI the separate events P(x)*P(y). This, in turn, is exactly
equivalent to the condition that Bayesian evidence exists between x
and y:
I(X,Y) > o >
P(x,y) ! P(x)*P(y)
P(x,y) / P(y) ! P(x)
P(x'y) ! P(x)
II you're conditioning on Z, you just adjust the whole derivation
accordingly:
I(X,Y ' Z) > o >
P(x,y'z) ! P(x'z)*P(y'z)
P(x,y'z) / P(y'z) ! P(x'z)
(P(x,y,z) / P(z)) / (P(y, z) / P(z)) ! P(x'z)
P(x,y,z) / P(y,z) ! P(x'z)
P(x'y,z) ! P(x'z)
\hich last line reads Lven knowing Z, learning Y still changes
our belieIs about X."
Conversely, as in our original case oI Z being even" or odd", Z
screens oII
8
X Irom Y that is, iI we know that Z is even", learning
that Y is in state tells us nothing more about whether X is z, , 6, or
8. Or iI we know that Z is odd", then learning that X is , tells us
nothing more about whether Y is or . Learning Z has rendered
X and Y conditionally independent.
Conditional independence is a hugely important concept in
probability theoryto cite just one example, without conditional
independence, the universe would have no structure.
Today, though, I only intend to talk about one particular kind
oI conditional independencethe case oI a central variable that
screens oII other variables surrounding it, like a central body with
tentacles.
Let there be Iive variables \, V, \, X, Y, and moreover, sup
pose that Ior every pair oI these variables, one variable is evidence
about the other. II you select \ and \, Ior example, then learning
\\
will tell you something you didn't know beIore about the
probability \\
.
An unmanageable inIerential mess: Lvidence gone wild: ot
necessarily.
8. Page 8, 'Argument Screens OII Authority'.
Maybe \ is Speaks a language", V is Two arms and ten digits",
\ is \ears clothes", X is Poisonable by hemlock", and Y is Red
blood". ow iI you encounter a thingintheworld, that might be
an apple and might be a rock, and you learn that this thing speaks
Chinese, you are liable to assess a much higher probability that it
wears clothes, and iI you learn that the thing is not poisonable by
hemlock, you will assess a somewhat lower probability that it has
red blood.
ow some oI these rules are stronger than others. There is the
case oI Fred, who is missing a Iinger due to a volcano accident, and
the case oI Barney the Baby who doesn't speak yet, and the case
oI Irving the IRCBot who emits sentences but has no blood. So
iI we learn that a certain thing is not wearing clothes, that doesn't
screen oII everything that its speech capability can tell us about its
blood color. II the thing doesn't wear clothes but does talk, maybe
it's ude ellie.
This makes the case more interesting than, say, Iive integer vari
ables that are all odd or all even, but otherwise uncorrelated. In
that case, knowing any one oI the variables would screen oII ev
erything that knowing a second variable could tell us about a third
variable.
But here, we have dependencies that don't go away as soon as
we learn just one variable, as the case oI ude ellie shows. So is
it an unmanageable inIerential inconvenience:
Fear not! Ior there may be some sixth variable Z, which, iI we
knew it, really would screen oII every pair oI variables Irom each
other. There may be some variable Zeven iI we have to construct
Z rather than observing it directlysuch that:
p(u'v,w,x,y,z) p(u'z)
p(v'u,w,x,y,z) p(v'z)
p(w'u,v,x,y,z) p(w'z)
.
Perhaps, given that a thing is human", then the probabilities oI
it speaking, wearing clothes, and having the standard number oI Iin
gers, are all independent. Fred may be missing a Iinger but he is no
more likely to be a nudist than the next person, ude ellie never
wears clothes, but knowing this doesn't make it any less likely that
she speaks, and Baby Barney doesn't talk yet, but is not missing any
limbs.
This is called the aive Bayes" method, because it usually isn't
quite true, but pretending that it's true can simpliIy the living day
lights out oI your calculations. \e don't keep separate track oI the
inIluence oI clothedness on speech capability given Iinger number.
\e just use all the inIormation we've observed to keep track oI the
probability that this thingy is a human (or alternatively, something
else, like a chimpanzee or robot) and then use our belieIs about the
central class to predict anything we haven't seen yet, like vulnera
bility to hemlock.
Any observations oI \, V, \, X, and Y just act as evidence
Ior the central class variable Z, and then we use the posterior dis
tribution on Z to make any predictions that need making about
unobserved variables in \, V, \, X, and Y.
Sound Iamiliar: It should:
Bleggz
As a matter oI Iact, iI you use the right kind oI neural network
units, this neural network" ends up exactly, mathematically equiva
lent to aive Bayes. The central unit just needs a logistic thresh
oldan Scurve responseand the weights oI the inputs just need
to match the logarithms oI the likelihood ratios, etcetera. In Iact,
it's a good guess that this is one oI the reasons why logistic response
oIten works so well in neural networksit lets the algorithm sneak
in a little Bayesian reasoning while the designers aren't looking.
]ust because someone is presenting you with an algorithm that
they call a neural network" with buzzwords like scruIIy" and
emergent" plastered all over it, disclaiming proudly that they have
no idea how the learned network workswell, don't assume that
their little AI algorithm really is Beyond the Realms oI Logic.
For this paradigm oI adhockery , iI it works, will turn out to have
Bayesian structure
j
, it may even be exactly equivalent to an algo
rithm oI the sort called Bayesian".
Lven iI it doesn't look Bayesian, on the surIace.
And then you just know that the Bayesians are going to start
explaining exactly how the algorithm works, what underlying as
sumptions it reIlects, which environmental regularities
o
it exploits,
where it works and where it Iails, and even attaching understand
able meanings to the learned network weights.
Lisappointing, isn't it:
j. http://lesswrong.com/lw/o7/searching_for_bayesstructure/
o. Page z;,, 'Mutual InIormation, and Lensity in Thingspace'.
26. Words as Mental Paintbrush Handles
Followup to: Conditional Independence, and aive Bayes

z
(\e should be done with the mathy posts, I think, at least Ior
now. But Iorgive me iI, ironically, I end up resorting to Rationality
Quotes Ior a day or two. I'm currently at the ACIo8 conIerence,
which, as oI the Iirst session, is not nearly so bad as I Ieared.)
Suppose I tell you: It's the strangest thing: The lamps in this
hotel have triangular lightbulbs."
You may or may not have visualized itiI you haven't done it
yet, do so nowwhat, in your mind's eye, does a triangular light
bulb" look like:
In your mind's eye, did the glass have sharp edges, or smooth:
\hen the phrase triangular lightbulb" Iirst crossed my
mindno, the hotel doesn't have themthen as best as my intro
spection could determine, I Iirst saw a pyramidal lightbulb with
sharp edges, then (almost immediately) the edges were smoothed,
and then my mind generated a loop oI Ilourescent bulb in the shape
oI a smooth triangle as an alternative.
As Iar as I can tell, no deliberative/verbal thoughts were in
volvedjust wordless reIlex Ilinch away Irom the imaginary mental
vision oI sharp glass, which design problem was solved beIore I
could even think in words.
Believe it or not, Ior some decades, there was a serious debate
about whether people really had mental images in their mindan
actual picture oI a chair somewhereor iI people just naively thought
they had mental images (having been misled by introspection", a
very bad Iorbidden activity), while actually just having a little chair"
label, like a LISP token, active in their brain.
I am trying hard not to say anything like How spectacularly sil
ly," because there is always the hindsight eIIect
to consider, but:
how spectacularly silly.
This academic paradigm, I think, was mostly a deranged legacy
oI behaviorism, which denied the existence oI thoughts in humans,
and sought to explain all human phenomena as reIlex", including
. http://lesswrong.com/lw/o9/words_as_mental_paintbrush_handles/
z. Page z88, 'Conditional Independence, and aive Bayes'.
speech. Behaviorism probably deserves its own post at some point,
as it was a perversion oI rationalism, but this is not that post.
You call it 'silly'," you inquire, but how do you know that your
brain represents visual images: Is it merely that you can close your
eyes and see them:"
This question used to be harder to answer, back in the day oI the
controversy. II you wanted to prove the existence oI mental im
agery scientiIically", rather than just by introspection, you had to
inIer the existence oI mental imagery Irom experiments like, e.g.:
Show subjects two objects and ask them iI one can be rotated into
correspondence with the other. The response time is linearly pro
portional to the angle oI rotation required. This is easy to explain
iI you are actually visualizing the image and continuously rotating
it at a constant speed, but hard to explain iI you are just checking
propositional Ieatures oI the image.
Today we can actually neuroimage the little pictures in the visu
al cortex. So, yes, your brain really does represent a detailed image
oI what it sees or imagines. See Stephen Kosslyn's Image and Brain:
The Resolution of the Imagery Debate.
Part oI the reason people get in trouble with words, is that they
do not realize how much complexity lurks behind words.
Can you visualize a green dog": Can you visualize a cheese ap
ple":
Apple" isn't just a sequence oI two syllables or Iive letters.
That's a shadow. That's the tip oI the tiger's tail.
\ords, or rather the concepts behind them, are paintbrush
esyou can use them to draw images in your own mind. Literally
draw, iI you employ concepts to make a picture in your visual cor
tex. And by the use oI shared labels, you can reach into someone
else's mind, and grasp their paintbrushes to draw pictures in their
mindssketch a little green dog in their visual cortex.
But don't think that, because you send syllables through the air,
or letters through the Internet, it is the syllables or the letters that
draw pictures in the visual cortex. That takes some complex in
structions that wouldn't Iit in the sequence oI letters. Apple" is ,
bytes, and drawing a picture oI an apple Irom scratch would take
more data than that.
Apple" is merely the tag attached to the true and wordless apple
concept, which can paint a picture in your visual cortex, or collide
with cheese", or recognize an apple when you see one, or taste its
archetype in apple pie, maybe even send out the motor behavior Ior
eating an apple.
And it's not as simple as just calling up a picture Irom memory.
Or how would you be able to visualize combinations like a trian
gular lightbulb"imposing triangleness on lightbulbs, keeping the
essence oI both, even iI you've never seen such a thing in your liIe:
Lon't make the mistake the behaviorists made. There's Iar
more to speech than sound in air. The labels are just point
ers"look in memory area 8;,o". Sooner or later, when you're
handed a pointer, it comes time to dereIerence it, and actually look
in memory area 8;,o.
\hat does a word point to:
WORDS AS MENTAL PAINTBRUSH HANDLES 297
27. Variable Question Fallacies
Followup to: \ords as Mental Paintbrush Handles

z
Albert: Lvery time I've listened to a tree Iall, it made a
sound, so I'll guess that other trees Ialling also make
sounds. I don't believe the world changes around when
I'm not looking."
Barry: \ait a minute. II no one hears it, how can it be
a sound:"
\hile writing the dialogue oI Albert and Barry in their dispute
over whether a Ialling tree in a deserted Iorest makes a sound,

I sometimes Iound myselI losing empathy with my characters. I
would start to lose the gut Ieel oI why anyone would ever argue like
that, even though I'd seen it happen many times.
On these occasions, I would repeat to myselI, Lither the Ialling
tree makes a sound, or it does not!" to restore my borrowed sense
oI indignation.
(P or -P) is not always a reliable heuristic, iI you substitute ar
bitrary Lnglish sentences Ior P. This sentence is Ialse" cannot be
consistently viewed as true or Ialse. And then there's the old clas
sic, Have you stopped beating your wiIe:"
ow iI you are a mathematician, and one who believes in clas
sical (rather than intuitionistic) logic, there are ways to continue
insisting that (P or -P) is a theorem: Ior example, saying that This
sentence is Ialse" is not a sentence.
But such resolutions are subtle, which suIIices to demonstrate
a need Ior subtlety. You cannot just bull ahead on every occasion
with Lither it does or it doesn't!"
So does the Ialling tree make a sound, or not, or.:
Surely, z - z X or it does not: \ell, maybe, if it's really the
same X, the same z, and the same - and . II X evaluates to , on
some occasions and on another, your indignation may be mis
placed.
. http://lesswrong.com/lw/oc/variable_question_fallacies/
z. Page zj,, '\ords as Mental Paintbrush Handles'.
. Page zz8, 'Lisputing LeIinitions'.
To even begin claiming that (P or -P) ought to be a necessary
truth, the symbol P must stand Ior exactly the same thing in both
halves oI the dilemma. Lither the Iall makes a sound, or not!"but
iI Albert::sound is not the same as Barry::sound, there is nothing
paradoxical about the tree making an Albert::sound but not a Bar
ry::sound.
(The :: idiom is something I picked up in my C-- days Ior
avoiding namespace collisions. II you've got two diIIerent packages
that deIine a class Sound, you can write Package::Sound to speciIy
which Sound you mean. The idiom is not widely known, I think,
which is a pity, because I oIten wish I could use it in writing.)
The variability may be subtle: Albert and Barry may careIully
veriIy that it is the same tree, in the same Iorest, and the same
occasion oI Ialling, just to ensure that they really do have a substan
tive disagreement about exactly the same event. And then Iorget
to check that they are matching this event against exactly the same
concept.
Think about the grocery store that you visit most oIten: Is it
on the leIt side oI the street, or the right: But oI course there is no
the leIt side" oI the street, only your leIt side, as you travel along it
Irom some particular direction. Many oI the words we use are real
ly Iunctions oI implicit variables supplied by context.
It's actually one heck oI a pain, requiring one heck oI a lot oI
work, to handle this kind oI problem in an ArtiIicial Intelligence
program intended to parse languagethe phenomenon going by
the name oI speaker deixis".
Martin told Bob the building was on his leIt." But leIt" is
a Iunctionword that evaluates with a speakerdependent variable
invisibly grabbed Irom the surrounding context. \hose leIt" is
meant, Bob's or Martin's:
The variables in a variable question Iallacy oIten aren't neatly la
beledit's not as simple as Say, do you think Z - z equals 6:"
II a namespace collision
introduces two diIIerent concepts that

look like the same concept" because they have the same nameor
a map compression
,
introduces two diIIerent events that look like
the same event because they don't have separate mental Iilesor
. Page z,, 'Taboo Your \ords'.
,. Page z,, 'Fallacies oI Compression'.
VARIABLE QUESTION FALLACIES 299
the same Iunction evaluates in diIIerent contextsthen reality it
selI becomes protean, changeable. At least that's what the algo
rithm Ieels like Irom inside
6
. Your mind's eye sees the map, not the
territory directly.
II you have a question with a hidden variable, that evaluates to
diIIerent expressions in diIIerent contexts, it feels like reality itselI
is unstablewhat your mind's eye sees, shiIts around depending on
where it looks.
This oIten conIuses undergraduates (and postmodernist proIes
sors) who discover a sentence with more than one interpretation,
they think they have discovered an unstable portion oI reality.
Oh my gosh! 'The Sun goes around the Larth' is true Ior Hunga
Huntergatherer, but Ior Amara Astronomer, 'The Sun goes around
the Larth' is Ialse! There is no Iixed truth!" The deconstruction oI
this sophomoric nitwittery is leIt as an exercise to the reader.
And yet, even I initially Iound myselI writing II X is , on some
occasions and on another, the sentence 'z - z X' may have no
Iixed truthvalue." There is not one sentence with a variable truth
value. z - z X" has no truthvalue. It is not a proposition, not yet,
not as mathematicians deIine propositionness, any more than z - z
" is a proposition, or Fred jumped over the" is a grammatical sen
tence.
But this Iallacy tends to sneak in, even when you allegedly know
better, because, well, that's how the algorithm Ieels Irom inside
;
.
6. Page zz, 'How An Algorithm Feels From Inside'.
;. Page zz, 'How An Algorithm Feels From Inside'.
28. 37 Ways That Words Can Be Wrong
Followup to: ]ust about every post in February, and some in

March
Some reader is bound to declare that a better title Ior this post
would be ; \ays That You Can \se \ords \nwisely", or ;
\ays That Suboptimal \se OI Categories Can Have egative Side
LIIects On Your Cognition".
But one oI the primary lessons oI this gigantic list is that saying
There's no way my choice oI X can be 'wrong'" is nearly always an
error in practice, whatever the theory. You can always be wrong.
Lven when it's theoretically impossible to be wrong, you can still
be wrong. There is never a CetOutOI]ailFree card Ior anything
you do. That's liIe.
Besides, I can deIine the word wrong" to mean anything I like
it's not like a word can be wrong.
Personally, I think it quite justiIied to use the word wrong"
when:
. A word fails to connect to reality in the first place. Is Socrates a
Iramster: Yes or no: (The Parable oI the Lagger.
z
)
z. Your argument, if it worked, could coerce reality to go a different
way by choosing a different word definition. Socrates is a
human, and humans, by deIinition, are mortal. So iI you
deIined humans to not be mortal, would Socrates live
Iorever: (The Parable oI Hemlock
.)
. You try to establish any sort of empirical proposition as being true
by definition. Socrates is a human, and humans, by
deIinition, are mortal. So is it a logical truth iI we
empirically predict that Socrates should keel over iI he
drinks hemlock: It seems like there are logically possible,
nonselIcontradictory worlds where Socrates doesn't keel
over where he's immune to hemlock by a quirk oI
biochemistry, say. Logical truths are true in all possible
worlds, and so never tell you which possible world you live
. http://lesswrong.com/lw/od/37_ways_that_words_can_be_wrong/
z. Page j, 'The Parable oI the Lagger'.
in and anything you can establish by deIinition" is a
logical truth. (The Parable oI Hemlock
.)
. You unconsciously slap the conventional label on something,
without actually using the verbal definition you just gave. You
know perIectly well that Bob is human", even though, on
your deIinition, you can never call Bob human" without
Iirst observing him to be mortal. (The Parable oI
Hemlock
,
.)
,. The act of labeling something with a word, disguises a
challengable inductive inference you are making. II the last
eggshaped objects drawn have been blue, and the last 8
cubes drawn have been red, it is a matter oI induction to
say this rule will hold in the Iuture. But iI you call the
blue eggs bleggs" and the red cubes rubes", you may
reach into the barrel, Ieel an egg shape, and think Oh, a
blegg." (\ords as Hidden InIerences.
6
)
6. You try to define a word using words, in turn defined with ever-
more-abstract words, without being able to point to an example.
\hat is red:" Red is a color." \hat's a color:" It's a
property oI a thing:" \hat's a thing: \hat's a
property:" It never occurs to you to point to a stop sign
and an apple. (Lxtensions and Intensions.
;
)
;. The extension doesnt match the intension. \e aren't
consciously aware oI our identiIication oI a red light in the
sky as Mars", which will probably happen regardless oI
your attempt to deIine Mars" as The Cod oI \ar".
(Lxtensions and Intensions.
8
)
8. Your verbal definition doesnt capture more than a tiny fraction
of the categorys shared characteristics, but you try to reason as if
it does. \hen the philosophers oI Plato's Academy
claimed that the best deIinition oI a human was a
Ieatherless biped", Liogenes the Cynic is said to have
exhibited a plucked chicken and declared Here is Plato's
Man." The Platonists promptly changed their deIinition
,. Page j, 'The Parable oI Hemlock'.
6. Page j;, '\ords as Hidden InIerences'.
;. Page zoo, 'Lxtensions and Intensions'.
8. Page zoo, 'Lxtensions and Intensions'.
to a Ieatherless biped with broad nails". (Similarity
Clusters.
j
)
j. You try to treat category membership as all-or-nothing, ignoring
the existence of more and less typical subclusters. Lucks and
penguins are less typical birds than robins and pigeons.
Interestingly, a betweengroups experiment showed that
subjects thought a dis ease was more likely to spread Irom
robins to ducks on an island, than Irom ducks to robins.
(Typicality and Asymmetrical Similarity.
o
)
o. A verbal definition works well enough in practice to point out the
intended cluster of similar things, but you nitpick exceptions. ot
every human has ten Iingers, or wears clothes, or uses
language, but iI you look Ior an empirical cluster oI things
which share these characteristics, you'll get enough
inIormation that the occasional nineIingered human
won't Iool you. (The Cluster Structure oI Thingspace
.)
. You ask whether something is or is not a category member but
cant name the question you really want answered. \hat is a
man": Is Barney the Baby Boy a man": The correct"
answer may depend considerably on whether the query
you really want answered is \ould hemlock be a good
thing to Ieed Barney:" or \ill Barney make a good
husband:" (Lisguised Queries.
z
)
z. You treat intuitively perceived hierarchical categories like the
only correct way to parse the world, without realizing that other
forms of statistical inference are possible even though your brain
doesnt use them. It's much easier for a human to notice
whether an object is a blegg" or rube", than for a human
to notice that red objects never glow in the dark, but red
Iurred objects have all the other characteristics oI bleggs.
Other statistical algorithms work diIIerently. (eural
Categories.
)
. You talk about categories as if they are manna fallen from the
Platonic Realm, rather than inferences implemented in a real
brain. The ancient philosophers said Socrates is a man",
j. Page zo, 'Similarity Clusters'.
o. Page zo6, 'Typicality and Asymmetrical Similarity'.
z. Page z, 'Lisguised Queries'.
. Page z;, 'eural Categories'.
37 WAYS THAT WORDS CAN BE WRONG 303
not, My brain perceptually classiIies Socrates as a match
against the 'human' concept". (How An Algorithm Feels
From Inside.
)
. You argue about a category membership even after screening off
all questions that could possibly depend on a category-based
inference. AIter you observe that an object is blue, egg
shaped, Iurred, Ilexible, opaque, luminescent, and
palladiumcontaining, what's left to ask by arguing, Is it a
blegg:" But iI your brain's categorizing neural network
contains a (metaphorical) central unit corresponding to
the inIerence oI bleggness, it may still feel like there's a
leItover question. (How An Algorithm Feels From
Inside.
,
)
,. You allow an argument to slide into being about definitions, even
though it isnt what you originally wanted to argue about. II,
beIore a dispute started about whether a tree Ialling in a
deserted Iorest makes a sound", you asked the two soon
tobe arguers whether they thought a sound" should be
deIined as acoustic vibrations" or auditory experiences",
they'd probably tell you to Ilip a coin. Only aIter the
argument starts does the deIinition oI a word become
politically charged. (Lisputing LeIinitions.
6
)
6. You think a word has a meaning, as a property of the word itself;
rather than there being a label that your brain associates to a
particular concept. \hen someone shouts, Yikes! A
tiger!", evolution would not Iavor an organism that thinks,
Hm. I have just heard the syllables 'Tie' and 'Crr' which
my Iellow tribemembers associate with their internal
analogues oI my own tiger concept and which aiiieeee
CR\CH CR\CH C\LP." So the brain takes a
shortcut, and it seems that the meaning oI tigerness is a
property oI the label itselI. People argue about the correct
meaning oI a label like sound". (Feel the Meaning.
;
)
;. You argue over the meanings of a word, even after all sides
understand perfectly well what the other sides are trying to say.
The human ability to associate labels to concepts is a tool
. Page zz, 'How An Algorithm Feels From Inside'.
6. Page zz8, 'Lisputing LeIinitions'.
;. Page z, 'Feel the Meaning'.
Ior communication. \hen people want to communicate,
we're hard to stop, iI we have no common language, we'll
draw pictures in sand. \hen you each understand what is
in the other's mind, you are done. (The Argument From
Common \sage.
8
)
8. You pull out a dictionary in the middle of an empirical or moral
argument. Lictionary editors are historians oI usage, not
legislators oI language. II the common deIinition contains
a problem iI Mars" is deIined as the Cod oI \ar, or a
dolphin" is deIined as a kind oI Iish, or egroes" are
deIined as a separate category Irom humans, the
dictionary will reIlect the standard mistake. (The
Argument From Common \sage.
j
)
j. You pull out a dictionary in the middle of any argument ever.
Seriously, what the heck makes you think that dictionary
editors are an authority on whether atheism" is a
religion" or whatever: II you have any substantive issue
whatsoever at stake, do you really think dictionary editors
have access to ultimate wisdom that settles the argument:
(The Argument From Common \sage.
zo
)
zo. You defy common usage without a reason, making it gratuitously
hard for others to understand you. Fast stand up plutonium,
with bagels without handle. (The Argument From
Common \sage.
z
)
z. You use complex renamings to create the illusion of inference. Is a
human" deIined as a mortal Ieatherless biped": Then
write: All |mortal Ieatherless bipeds] are mortal, Socrates
is a |mortal Ieatherless biped], thereIore, Socrates is
mortal." Looks less impressive that way, doesn't it:
(Lmpty Labels.
zz
)
zz. You get into arguments that you could avoid if you just didnt use
the word. II Albert and Barry aren't allowed to use the
word sound", then Albert will have to say A tree Ialling
in a deserted Iorest generates acoustic vibrations", and
Barry will say A tree Ialling in a deserted Iorest generates
8. Page z8, 'The Argument Irom Common \sage'.
j. Page z8, 'The Argument Irom Common \sage'.
zo. Page z8, 'The Argument Irom Common \sage'.
z. Page z8, 'The Argument Irom Common \sage'.
zz. Page zz, 'Lmpty Labels'.
no auditory experiences". \hen a word poses a problem,
the simplest solution is to eliminate the word and its
synonyms. (Taboo Your \ords.
z
)
z. The existence of a neat little word prevents you from seeing the
details of the thing youre trying to think about. \hat actually
goes on in schools once you stop calling it education":
\hat's a degree, once you stop calling it a degree": II a
coin lands heads", what's its radial orientation: \hat is
truth", iI you can't say accurate" or correct" or
represent" or reIlect" or semantic" or believe" or
knowledge" or map" or real" or any other simple term:
(Replace the Symbol with the Substance.
z
)
z. You have only one word, but there are two or more different
things-in-reality, so that all the facts about them get dumped into
a single undifferentiated mental bucket. It's part oI a
detective's ordinary work to observe that Carol wore red
last night, or that she has black hair, and it's part oI a
detective's ordinary work to wonder iI maybe Carol dyes
her hair. But it takes a subtler detective to wonder iI
there are two Carols, so that the Carol who wore red is
not the same as the Carol who had black hair. (Fallacies
oI Compression.
z,
)
z,. You see patterns where none exist, harvesting other
characteristics from your definitions even when there is no
similarity along that dimension. In ]apan, it is thought that
people oI blood type A are earnest and creative, blood
type Bs are wild and cheerIul, blood type Os are agreeable
and sociable, and blood type ABs are cool and controlled.
(Categorizing Has Consequences.
z6
)
z6. You try to sneak in the connotations of a word, by arguing from a
definition that doesnt include the connotations. A wiggin" is
deIined in the dictionary as a person with green eyes and
black hair. The word wiggin" also carries the
connotation oI someone who commits crimes and
launches cute baby squirrels, but that part isn't in the
dictionary. So you point to someone and say: Creen
z. Page z8, 'Replace the Symbol with the Substance'.
z,. Page z,, 'Fallacies oI Compression'.
z6. Page z,;, 'Categorizing Has Consequences'.
eyes: Black hair: See, told you he's a wiggin! \atch,
next he's going to steal the silverware." (Sneaking in
Connotations.
z;
)
z;. You claim X, by definition, is a Y! On such occasions youre
almost certainly trying to sneak in a connotation of Y that wasnt
in your given definition. You deIine human" as a
Ieatherless biped", and point to Socrates and say, o
Ieathers two legs he must be human!" But what you
really care about is something else, like mortality. II what
was in dispute was Socrates's number oI legs, the other
Iellow would just reply, \haddaya mean, Socrates's got
two legs: That's what we're arguing about in the Iirst
place!" (Arguing By LeIinition".
z8
)
z8. You claim Ps, by definition, are Qs! II you see Socrates out
in the Iield with some biologists, gathering herbs that
might conIer resistance to hemlock, there's no point in
arguing Men, by deIinition, are mortal!" The main time
you Ieel the need to tighten the vise by insisting that
something is true by deIinition" is when there's other
inIormation that calls the deIault inIerence into doubt.
(Arguing By LeIinition".
zj
)
zj. You try to establish membership in an empirical cluster by
definition. You wouldn't Ieel the need to say, Hinduism,
by definition, is a religion!" because, well, oI course
Hinduism is a religion. It's not just a religion by
deIinition", it's, like, an actual religion. Atheism does not
resemble the central members oI the religion" cluster, so
iI it wasn't Ior the Iact that atheism is a religion by
definition, you might go around thinking that atheism
wasnt a religion. That's why you've got to crush all
opposition by pointing out that Atheism is a religion" is
true by definition, because it isn't true any other way.
(Arguing By LeIinition".
o
)
o. Your definition draws a boundary around things that dont really
belong together. You can claim, iI you like, that you are
defining the word Iish" to reIer to salmon, guppies, sharks,
z;. Page z6o, 'Sneaking in Connotations'.
z8. Page z6, 'Arguing By LeIinition"'.
zj. Page z6, 'Arguing By LeIinition"'.
o. Page z6, 'Arguing By LeIinition"'.
dolphins, and trout, but not jellyIish or algae. You can
claim, iI you like, that this is merely a list, and there is no
way a list can be wrong". Or you can stop playing nitwit
games and admit that you made a mistake and that
dolphins don't belong on the Iish list. (\here to Lraw
the Boundary:
)
. You use a short word for something that you wont need to
describe often, or a long word for something youll need to describe
often. This can result in inefficient thinking, or even
misapplications of Occams Razor, if your mind thinks that short
sentences sound simpler. \hich sounds more plausible,
Cod did a miracle" or A supernatural universecreating
entity temporarily suspended the laws oI physics":
(Lntropy, and Short Codes.
z
)
z. You draw your boundary around a volume of space where there
is no greater-than-usual density, meaning that the associated
word does not correspond to any performable Bayesian inferences.
Since greeneyed people are not more likely to have black
hair, or vice versa, and they don't share any other
characteristics in common, why have a word Ior wiggin":
(Mutual InIormation, and Lensity in Thingspace.
)
. You draw an unsimple boundary without any reason to do so.
The act oI deIining a word to reIer to all humans, except
black people, seems kind oI suspicious. II you don't
present reasons to draw that particular boundary, trying to
create an arbitrary" word in that location is like a
detective saying: \ell, I haven't the slightest shred oI
support one way or the other Ior who could've murdered
those orphans. but have we considered ]ohn Q.
\iIIleheim as a suspect:" (Superexponential
Conceptspace, and Simple \ords.
)
. You use categorization to make inferences about properties that
dont have the appropriate empirical structure, namely,
conditional independence given knowledge of the class, to be well-
approximated by Naive Bayes. o way am I trying to
. Page z6;, '\here to Lraw the Boundary:'.
z. Page z;, 'Lntropy, and Short Codes'.
. Page z;,, 'Mutual InIormation, and Lensity in Thingspace'.
. Page z8z, 'Superexponential Conceptspace, and Simple \ords'.
summarize this one. ]ust read the blog post. (Conditional
Independence, and aive Bayes.
,
)
,. You think that words are like tiny little LISP symbols in your
mind, rather than words being labels that act as handles to direct
complex mental paintbrushes that can paint detailed pictures in
your sensory workspace. Visualize a triangular lightbulb".
\hat did you see: (\ords as Mental Paintbrush
Handles.
6
)
6. You use a word that has different meanings in different places as
though it meant the same thing on each occasion, possibly creating
the illusion of something protean and shifting. Martin told
Bob the building was on his leIt." But leIt" is a Iunction
word that evaluates with a speakerdependent variable
grabbed Irom the surrounding context. \hose leIt" is
meant, Bob's or Martin's: (Variable Question Fallacies.
;
)
;. You think that definitions cant be wrong, or that I can define
a word any way I like! This kind oI attitude teaches you to
indignantly deIend your past actions, instead oI paying
attention to their consequences, or Iessing up to your
mistakes. (; \ays That Suboptimal \se OI Categories
Can Have egative Side LIIects On Your Cognition.
8
)
Lverything you do in the mind has an eIIect, and your brain
races ahead unconsciously without your supervision.
Saying \ords are arbitrary, I can deIine a word any way I like"
makes around as much sense as driving a car over thin ice with the
accelerator Iloored and saying, Looking at this steering wheel, I
can't see why one radial angle is special so I can turn the steering
wheel any way I like."
II you're trying to go anywhere, or even just trying to survive,
you had better start paying attention to the three or six dozen
optimality criteria that control how you use words, deIinitions, cat
egories, classes, boundaries, labels, and concepts.
,. Page z88, 'Conditional Independence, and aive Bayes'.
6. Page zj,, '\ords as Mental Paintbrush Handles'.
;. Page zj8, 'Variable Question Fallacies'.
8. Page o, '; \ays That \ords Can Be \rong'.

Lesswrong Sequences Volume 1.0

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Lesswrong Sequences Volume 1.0

Hochgeladen von

Copyright:

Verfügbare Formate

LessWrong.

I remember this paper I wrote on existentialism. My

, someone rates the prob

in which the number oI observers

Some oI the comments in this blog have touched on the question

The sentence 'snow is white' is true iI and only iI snow is

Followup to: \hat is Lvidence:

, I deIined evidence as an event entangled, by links oI

in order to win: Suppose there are seventy balls, drawn without

In \hat is Lvidence:", I wrote

Followup to: Burdensome Letails

The more complex an explanation is, the more evidence you

, as The more complex a proposition

, it may seem all too obvious that iI a woman is a witch,

Continuation of: \hat is Lvidence:

, and you can understand it. You can

at this, Ior it is indeed mar

your picture oI the world,

Thus begins the ancient parable:

Followup to: Making BelieIs Pay Rent (in Anticipated Lxperi

oI a man who comes to us and

. The claimant clearly does not

I once attended a panel on the topic, Are science and religion

" isn't a statement oI Iact, or an attempt

.) It wasn't just a cheer, like

I have so Iar distinguished between belieI as anticipationcon

, proIessing and cheering

in other countries, but that's a topic Ior another post.) Iden

\ill bond yields go up, or down, or remain the same: II you're a TV

. Clearly, this is not a matter Ior probabilities".

(The Iollowing happened to me in an IRC chatroom, long enough

From Robyn Lawes's Rational Choice in an Uncertain World:

is your ability to be more con

Followup to: Absence oI Lvidence Is Lvidence oI Absence.

" is a special case oI a more general law, which I would

. Law and Human Behavior, j():

: Take a moment to think beIore continuing.

: The words are easily enough proIessed

Followup to: Fake Lxplanations

Theory of Light and Matter. I

, I couldn't see its truth

. Maybe the near side just

Smallerstormz Prerequisites: Fake

The preview Ior the

Followup to: Fake Lxplanations

Phlogiston was the 8 century's answer to the Llemental Fire

, it could explain everything.

" is just the nontechnical way oI

And the child asked:

Imagine looking at your hand, and knowing nothing oI cells, noth

, is probably giving it too much credit. It Iunctioned

. You said \hy:" and the answer

this or that, but the reason why the mysterious Iorce

Prerequisites: BelieI in BelieI

, Mysterious Answers to Mysterious Questions

. \hat do you know, aIter you

that people had to use beIore emer

Once upon a time.

'. Huh. So. now I've got

behind something mysterious, a causal node that Ieels like an

create an illusion oI understanding. \iser to say

I am teaching a class, and I write upon the blackboard three num