Sie sind auf Seite 1von 11

Royal Statistical Society

Wiley

Statistical Methods and Scientific Induction


Author(s): Ronald Fisher
Source: Journal of the Royal Statistical Society. Series B (Methodological), Vol. 17, No. 1 (1955),
pp. 69-78
Published by: Wiley for the Royal Statistical Society
Stable URL: http://www.jstor.org/stable/2983785
Accessed: 06-10-2015 17:38 UTC

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at http://www.jstor.org/page/
info/about/policies/terms.jsp

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content
in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship.
For more information about JSTOR, please contact support@jstor.org.

Royal Statistical Society and Wiley are collaborating with JSTOR to digitize, preserve and extend access to Journal of the
Royal Statistical Society. Series B (Methodological).

http://www.jstor.org

This content downloaded from 158.121.247.60 on Tue, 06 Oct 2015 17:38:04 UTC
All use subject to JSTOR Terms and Conditions
1955] 69

STATISTICAL
METHODSAND SCIENTIFIC
INDUCTION

By Sir RONALDFISHER
Department
of Genetics,University
of Cambridge

SUMMARY
THE attemptto reinterpret thecommontestsof significance
used in scientific
research
as thoughtheyconstitutedsome kindof acceptanceprocedureand led to "decisions"
in Wald's sense, originatedin several misapprehensionsand has led, apparently,
to severalmore.
The threephrases examinedhere,with a view to elucidatingthe fallaciesthey
embody,are:
(i) "Repeated samplingfromthe same population",
(ii) Errorsof the "second kind",
(iii) "Inductivebehaviour".
Mathematicianswithoutpersonal contact with the Natural Sciences have often
been misledby such phrases. The errorsto which theylead are not always only
numerical.

1. Introduction
DURING the presentcenturya good deal of progressseems to have been made in the businessof
interpretingobservationaldata, so as to obtain a betterunderstandingof the real world. The
threeaspects of principleimportanceforthis progresshave been, first,the use of bettermathe-
maticsand morecomprehensive ideas in mathematicalstatistics;leadingto morecorrector exact
methodsof calculation,applied to the givenbody of data (a unique sample in the language of
W. S. Gosset,writingunderthename of "Student")whichcomprehendsall thenumericalinforma-
tion available on the topic underdiscussion. Secondly,as methodsof summarizing and drawing
correctconclusionsapproached adequacy, the wide subject of experimentaldesign was opened
up, aimed at obtainingdata more completeand precise,and at avoiding waste of effortin the
accumulationof ill-planned,indecisive,or irrelevent observations. Thirdly,as a naturalor even
inevitableconcomitantof the firsttwo, a more completeunderstanding has been reached of the
structureand peculiaritiesof inductivelogic-that is of reasoningfromthe sampleto the popula-
tion fromwhichthe sample was drawn,fromconsequencesto causes, or in more logical terms,
fromthe particularto the general.
Much thatI have to say will not commanduniversalassent. I knowthisforit is just because
I findmyselfin disagreementwith some of the modes of expositionof this new subject which
have fromtimeto timebeen adopted,thatI have takenthisopportunity of expressinga different
pointof view; different in particularfromthatexpressedin numerouspapersby Neyman,Pearson
Wald and Bartlett. Thereis no difference to matterin the fieldof mathematical analysis,though
differentnumericalresultsare arrivedat, but thereis a clear difference in logical point of view,
and I owe to ProfessorBarnard of The Imperial College the penetratingobservationthat this
differencein pointof vieworiginatedwhenNeyman,thinking thathe was correcting and improving
my own early work on testsof significance, as a means to the "improvementof naturalknow-
ledge", in factreinterpreted themin termsof thattechnologicaland commercialapparatuswhich
is knownas an acceptanceprocedure.
Now, acceptance proceduresare of great importancein the modernworld. When a large
concernlike the Royal Navy receivesmaterialfroman engineering firmit is, I suppose,subjected
to sufficientlycarefulinspectionand testingto reduce the frequencyof the acceptanceof faulty
or defectiveconsignments. The instructions to the Officerscarryingout the tests must also, I
conceive,be intendedto keep low both the cost of testingand the frequencyof the rejectionof
satisfactorylots. Much ingenuity and skillmustbe exercisedin makingthe acceptanceprocedure
a reallyeffectualand economicalone. I am castingno contempton acceptanceprocedures,and

This content downloaded from 158.121.247.60 on Tue, 06 Oct 2015 17:38:04 UTC
All use subject to JSTOR Terms and Conditions
70 FISHER-StatisticalMethodsand ScientificInduction [Part 1,

I am thankful,wheneverI travelby air, that the high level of precisionand reliabilityrequired


can really be achieved by such means. But the logical differences betweensuch an operation
and the work of scientificdiscoveryby physical or biological experimentation seem to me so
wide thatthe analogybetweenthemis not helpful,and theidentification of thetwo sortsof opera-
tion is decidedlymisleading.
I shall hope to bringout some of the logical differences more distinctly, but thereis also, I
fancy,in the backgroundan ideological difference.Russians are made familiarwiththe ideal
thatresearchin pure sciencecan and should be gearedto technologicalperformance, in the com-
prehensiveorganizedeffortof a five-year plan for the nation. How far,withinsuch a system,
personal and individualinferencesfromobservedfactsare permissiblewe do not know, but it
may be safer,and even,in such a politicalatmosphere,more agreeable,to regardone's scientific
worksimplyas a contributary elementin a greatmachine,and to conceal ratherthanto advertise
the selfishand perhaps hereticalaim of understandingfor oneselfthe scientificsituation. In
the U.S. also the greatimportanceof organizedtechnologyhas I thinkmade it easy to confuse
the process appropriatefor drawingcorrectconclusions,withthose aimed ratherat, let us say,
speedingproduction,or saving money. There is thereforesomethingto be gained by at least
being able to thinkof our scientificproblemsin a language distinctfromthat of technological
efficiency.
I believeI can best illustratethe contrastI wantto make clear by takinga fewcurrentphrases
whichare foreignto my own point of view,and afterexaminingthese,by settingout in a more
constructivespirit,some of the special characteristicsof inductivereasoning. The phrases I
should choose forthe fallaciestheyembodyare:
(i) Repeated samplingfromthe same population.
(ii) Errorsof the "second kind".
(iii) "Inductivebehaviour".
But firstI mustexemplify the extentto whichdivergencein languagehas been carriedby quoting
some rathersimplephrasesfromWald's book on Decision Functions.
On the outsideof the cover we read, "Particularlynoteworthy is the treatment of experiment
designas a part of the generaldecisionproblem".
On theinside,"The designofexperimentation is made a partofthegeneraldecisionproblems-
a major advance beyond previous results",and in the firstparagraph of the author's preface
"A major advance beyondpreviousresultsis the treatment of the designof experimentation as a
part of the generaldecisionproblem".
These claims seem verymuchlike an afterthought, of a kind whichis sometimessuggestedby
a publisher; for,apartfromthesethreequotations,thedesignofexperiments is scarcelymentioned
in the rest of the book. For example,the index does not contain the word "replication",or
"control", or "randomization"; thereis no discussionof the functionsand purposes of these
threeelementsof design. Of authorities,the bibliographydoes not containthe names of Yates,
of Finney,or of Davies; or, on the otherside of the Atlantic,of Goulden, who was the firstof
transatlanticwriterson the design of experiments, or of Cochran and Cox. My own book is
indeedmentioned,but no use seemsto have been made of it. The obviousinferenceis thatWald
was quite unaware of the natureand scope of the subjectof experimental design,but had simply
assumed thatit mustbe includedin thatof acceptanceprocedures,to whichhis book is devoted.
Rather similar,equally innocentand unfoundedpresumptions, have been not uncommonin the
last twentyyears. They would scarcelyhave been possiblewithoutthatinsulationfromall living
contactwiththe naturalsciences,whichis a disconcerting featureof manymathematicaldepart-
ments.
The firstquestionablephrase and the one responsiblefor the greatestamount of numerical
erroris:

2. "Repeated SamplingfromtheSame Population"


The operativepropertiesof an acceptance procedure,single or sequential,are ascertained
practicallyor conceptuallyby applyingit to a seriesof successivesimilarsamplesfromthe same
source of supply,and determining the frequenciesof the variouspossibleresults. It is doubtless
in consequence of this that it has been thought,and frequently asserted,that the validityof a

This content downloaded from 158.121.247.60 on Tue, 06 Oct 2015 17:38:04 UTC
All use subject to JSTOR Terms and Conditions
1955] FISHER-StatisticalMethods and ScientificInduction 71

is to be judged in the same way. However,a ratherlargenumberof examples


testof significance
are now knownin whichthisrule is seen to be misleading. The root of the difficulty of carrying
over the idea fromthe fieldof acceptanceproceduresto thatof testsof significance is that,where
acceptance proceduresare appropriate,the source of supply has an objective reality,and the
populationof lots,or one or more,whichcould be successivelychosenforexaminationis uniquely
defined; whereas if we possess a unique sample in student'ssense on which significancetests
are to be performed,thereis always, as Venn (1876) in particularhas shown,a multiplicity of
populationsto each of whichwe can legitimately regardour sample as belonging; so that the
phrase "repeated samplingfromthe same population" does not enable us to determinewhich
populationis to be used to definethe probabilitylevel,forno one of themhas objectivereality,
all beingproductsof the statistician'simagination. In respectof testsof significance, therefore,
thereis need forfurther guidanceas to how thisimaginationis to be exercised. In facta careful
choice has to be made, based on an understanding of the questionor questionsto be answered.
By ignoringthisnecessitya "theoryof testinghypotheses"has been producedin whicha primary
requirement of any competenttesthas been overlooked.
Considerthe case of simplelinearregression. Let us suppose thatthe numericaldata consist
of N pairs of values (x, y), whilethe qualitativedata tell us that foreach value of x the variate
y is normallydistributedwithvariance a about a mean givenby
Y-= E(y) = X + x 3, E-(y - Y)2 = a2,

beinga linearfunctionof thevariatex. The qualitativedata mayalso tellus how x is distributed,


withor withoutspecificparameters; thisinformation is irrelevant.
In such cases the unknownparameter, 3, may be estimatedand the precisionof estimation
determined by a standardand well knownprocedure; let
A = S(x -X)2 B = S(x -x)(y - y), C = S(y-y)2.
Then we nmay
take as our estimateof 3 the statistic
b = B/A,
and of a2 the statistic
s2 (C-B2/A) - (N-2).
For sampleshavingthe same value A it is easy to show thatthe estimateb is normallydistributed
about f withvariance a2/A,so thatwe have a typicalanalysisof variance:

d.f. Sum ofSquares Mean Square


1 A(b - P)2 A(b - P)2
N-2 C-B2IA 52

and the significance


of the deviationof b fromzero, or any otherproposedvalue of 3,is a simple
t-testwithN - 2 degreesof freedom,with

t= (b -P)\/r A

wherePo is the theoreticalvalue proposedforcomparison.


I do not believethatanyone doubtsthe validityof thissimpletest. It does, however,violate
the rule of determining levels of significance
by frequenciesof occurrenceof the proposed events
in repeatedsamplesfromthe same population. For if a successionof sets of N pairs of observa-
tions (x, y) weretakenfromthe same population,the value of A would not be the same foreach
set. Consequently,the frequencydistributionof b - r in the aggregateof all such sets would
not be the same as thatwhichI have calculatedtakingA constant,and would indeedbe unknown
untilthe samplingvariationof A wereinvestigated. In reality,therefore, no one uses the rule of
determining the level of significanceby successivesamplingfromthe population of all random
samples of N pairs of values, but, ever since the rightapproach was indicated(Fisher 1922), the
selectionof all randomsampleshavinga constantvalue A, equal to that actuallyobservedin the

This content downloaded from 158.121.247.60 on Tue, 06 Oct 2015 17:38:04 UTC
All use subject to JSTOR Terms and Conditions
72 FISHER-StatisticalMethods and ScientificInduction [Part 1,

sampleundertest,iswhathasinfactbeenused. Thenormaldistribution ofb aboutf withvariance


a2/Adoesnotcorrespond withanyrealistic processofsampling foracceptance, butto a popula-
tionof samplesin all relevantrespects likethatobserved, neither moreprecisenorlessprecise,
and whichtherefore we thinkit appropriate to selectin specifyingtheprecision of theestimate,
b. In relationto theestimation of ,BthevalueA is knownas an ancillary statistic.Had it been
necessary we shouldnothavehesitated to specify all valuesof x (xl, . . . , XN) individually,
butthiswouldhavemadeno difference oncethecomprehensive valueA had beenspecified.
The confusion introduced,evenin thecase of themostfundamental and logicallysimpleof
testsof significance,
by theintroduction of thenotionof basingtheteston repeatedsampling
fromthesamepopulation, is wellillustrated bysomeepisodes,whichoughtnotto be forgotten,
inthecurioushistory oftestingproportionality ina two-by-two table.
In thesolutionof theproblemof the2 x 2 table,putforward concurrentlybyDr. F. Yates
and myself in 1934,theessentialpointwas therecognition thattheprobabilities of occurrence
of different
possible tables,havingthesame marginaltotals
a b a+b
c d c+d

a + c b +d n
wereproportional
simplyto
1/a ! b ! c !d!

wherea, b, c and d are thefourfrequencies


observedin thedoubledichotomy, whatever
might
be theprobabilitiesgoverning
themarginaldistributions.
Withinsetsof tableshavingthesame
margins,therefore,eachmaybe assignedan absoluteprobability:

(a + b) ! (a Jr c) (b + d) ! (c + d) ! 1
n! *a ! b!c!d!
wherethenewfactordependsonlyon themargins and noton thecontents.
In thiscase themargins of thetable,whichby themselves supplyno information as to the
proportionalityof thecontents,do, likethevalueA in theregression example,determine how
muchinformation thecontentswillcontain. The reasonableprinciple thatin testing thesigni-
ficancewitha uniquesample,we shouldcompareit onlywithotherpossibilities in all relevant
respectslikethatobserved,willlead us to set aside thevariouspossibletableshavingdifferent
margins, therelativefrequencies
of whichmustdependon unknown factorsof thepopulation
sampled.
On two occasionsin theintervening twenty yearsdistinguished statisticians
haveattempted
to bringintotheaccountpopulations offourfoldtablesnothavingfixedmargins.In bothcases,
suchis thereasonablenessofhumannaturein favourable cases,theauthorsoftheseinnovations
withdrew themaftersomediscussion, and expressed themselvesas completely satisfiedthatthe
apparent advancetheyhadmadewasillusory.Thefirst wasProfessor E. B. WilsonoftheHarvard
Schoolof PublicHealth,writing in Sciencein 1941,and latertakingoccasionto expoundthe
methodof Fisherand Yates in two papers in theProceedingsof theNationalAcademyof Sciences
inthefollowingyear. ThesecondcasewasthatofProfessor Barnard,whostartedon theassump-
tionthatthemethodexpounded byNeymanand Pearsoncouldbe reliedon,and in thefirst flush
ofsuccessreported a testusingthelanguageofthattheory "muchmorepowerful thanFisher's",
butwhoalso,aftersomediscussion, hadthegenerosityto go outofhiswayto explainthatfurther
meditationhad led himto theconclusion thatFisherwas rightafterall.
ProfessorBarnardhas a keenand highly trained mathematicalmind,and thefactthathe was
misledintomuchwastedeffort and disappointment shouldbe a warningthatthetheoryoftesting
hypothesessetoutbyNeymanand Pearsonhas missedat leastsomeoftheessentials oftheprob-
lem,and willmisleadotherswho acceptit uncritically. Indeed,in thematterof Behren'stest
forthesignificance
ofthedifferencebetween themeansoftwosmallsamples, objectionwastaken
on exactly
thegroundthatthesignificancelevelis notthesameas thefrequencyfoundon repeated
sampling.

This content downloaded from 158.121.247.60 on Tue, 06 Oct 2015 17:38:04 UTC
All use subject to JSTOR Terms and Conditions
1955] FIsHER-Statistical Methodsand Scientific
Induction 73
The examplesI havegivenfromsimpler problemsshowclearlythatit shouldneverhavebeen
putforwardin thefieldof significance
tests,thoughperhapsperfectly
appropriate to acceptance
sampling.

3. Errorsof the"Second Kind"


The phrase"Errorsof thesecondkind",althoughapparently onlya harmless pieceof tech-
nicaljargon,is usefulas indicating thetypeofmentalconfusion in whichitwascoined.
In an acceptance procedure lotswillsometimes be acceptedwhichwouldhavebeenrejected
hadtheybeenexamined fully,andotherlotswillhavebeenrejected when,inthissense,theyought
to havebeenaccepted. A well-designed acceptance procedure is one whichattempts to minimize
thelossesentailedby suchevents. To do thisone musttakeaccountof thecostliness of each
typeof error,iferrorstheyshouldbe called,and in similartermsofthecostliness ofthetesting
process; it musttakeaccountalso of thefrequencies of each typeof event. For thisreason
probability a priori, or ratherknowledge basedon pastexperience, of thefrequencies withwhich
lotsof different qualityare offered, is of greatimportance;whereas, in scientific
research, or in
the processof "learningby experience", such knowledge a prioriis almostalwaysabsentor
negligible.
Simplyfromthepointof viewofan acceptance procedure, thoughwe maybyanalogythink
ofthesetwokindsofeventsas "errors"and recognize thattheyareerrorsin oppositedirections,
I doubtifanyonewouldhavethought of distinguishingthemas of twokinds,forin thismilieu
theyare essentially of one kindonlyand of equal theoretical importance.It was onlywhenthe
relationbetweena testof significance and itscorresponding nullhypothesis was confusedwith
an acceptanceprocedure thatit seemedsuitableto distinguish errorsin whichthehypothesis is
rejected wrongly, fromerrors inwhichitis "acceptedwrongly" as thephrasedoes. Thefrequency
of thefirstclass,relativeto thefrequency withwhichthehypothesis is true,is calculable,and
therefore controllable simplyfromthe specification of the null hypothesis.The frequency of
thesecondkindmustdependnotonlyon thefrequency withwhichrivalhypotheses are in fact
true,butalso greatly on howcloselytheyresemble thenullhypothesis.Sucherrorsare there-
foreincalculable bothin frequency and in magnitude merelyfromthespecification of thenull
hypothesis, andwouldneverhavecomeintoconsideration inthetheory onlyoftestsofsignificance,
hadthelogicofsuchtestsnotbeenconfused withthatofacceptance procedures.
It maybe addedthatin thetheory ofestimation we consider a continuum ofhypotheses each
eligibleas nullhypothesis, and it is theaggregate of frequencies calculatedfromeachpossibility
in turnas true-including frequencies of error,therefore onlyof the"firstkind",without any
assumptions of knowledge a priori-whichsupplythe likelihoodfunction, fiduciallimits,and
otherindications oftheamountofinformation available. Theintroduction ofallusionsto errors
of thesecondkindin sucharguments is entirely formaland ineffectual.
Thefashion ofspeaking ofa nullhypothesis as "acceptedwhenfalse",whenever a testofsignifi-
cancegivesus no strong reasonforrejecting it,andwheninfactitis insomewayimperfect, shows
realignorance of theresearchworkers' attitude, by suggesting thatin sucha case he has come
to an irreversible decision.
The worker's realattitude in sucha case might be,according to thecircumstances:
(a) "The possibledeviation fromtruthof myworking hypothesis, to examinewhichthetest
is appropriate, seemsnotto be of sufficient magnitude to warrant anyimmediate modification."
Or it mightbe:
(b) "The deviationis in the direction expectedforcertaininfluences whichseemedto me
notimprobable, andtothisextent mysuspicion hasbeenconfirmed; butthebodyofdataavailable
so faris notbyitself sufficientto demonstrate theirreality."
Theseexamples showhowbadlytheword"error"isusedindescribing sucha situation.More-
over,it is a fallacy,so wellknownas to be a standard example, to concludefroma testofsignifi-
cancethatthenullhypothesis is thereby established;at mostit maybe said to be confirmed or
strengthened.
In an acceptance procedure, on theotherhand,acceptance is irreversible,whether theevidence
forit was strongor weak. It is theresultof applying mechanically ruleslaid downin advance;

This content downloaded from 158.121.247.60 on Tue, 06 Oct 2015 17:38:04 UTC
All use subject to JSTOR Terms and Conditions
74 FISHER-StatisticalMethodsand ScientificInduction [Part 1,

no thought is givento theparticularcase, and thetester'sstateof mind,or his capacityforlearning,


is inoperative.
By contrast,the conclusionsdrawnby a scientific workerfroma testof significanceare pro-
visional,and involvean intelligentattemptto understand the experimentalsituation.

4. "InductiveBehaviour"
The erroneousinsistenceon the formulaof "repeated samplingfromthe same population"
and the misplacedemphasison "errorsof the second kind" seem both clearlyenough to flow
fromthe notion that the process by whichexperimenters learn fromtheirexperiments mightbe
equated to some equivalentacceptance procedure. The same confusionevidentlytakes part in
the curiouspreferenceexpressedby J. Neyman forthe phrase "inductivebehaviour"to replace
what he regardsas the mistakenphrase "inductivereasoning".
Logicians,in introducingthe terms"inductivereasoning"and "inductiveinference"evidently
implythattheyare speakingof processesof the mindfallingto some extentoutsidethoseof which
a fullaccountcan be givenin termsof thetraditionaldeductivereasoningof formallogic. Deduc-
tive reasoningin particularsuppliesno essentiallynew knowledge,but merelyrevealsor unfolds
the implicationsof the axiomatic basis adopted. Ideally, perhaps, it should be carried out
mechanically. It is the functionof inductivereasoningto be used, in conjunctionwithobserva-
tional data, to add new elementsto our theoreticalknowledge. That such a process existed,
and was possible to normalminds,has been understoodforcenturies; it is only withtherecent
developmentof statisticalsciencethat an analyticaccount can now be given,about as satisfying
and complete,at least,as thatgiventraditionallyof the deductiveprocesses.
When,therefore, Neyman denies the existenceof inductivereasoninghe is merelyexpressing
a verbal preference. For him "reasoning" means what "deductivereasoning"means to others.
He does not tellus what in his vocabularystandsforinductivereasoning,forhe does not clearly
understandwhat that is. What he tells us to call "inductivebehaviour" is merelythe practice
of makingsome assertionof theform
T< 0
in some circumstances, and refraining fromthisassertionin others. This is evidentlyan effort to
assimilatea testof significanceto an acceptanceprocedure. From a testof significance, however,
we learn more than that the body of data at our disposal would have passed an acceptancetest
at some particularlevel; we may learn, ifwe wish to, and it is to thisthatwe usuallypay atten-
tion, at what level it would have been doubtful; doing this we have a genuinemeasureof the
conifidence withwhichany particularopinionmay be held,in view of our particulardata. From
a strictlyrealisticviewpointwe have no expectationof an unendingsequence of similarbodies of
data, to each of whicha mechanical"yes or no" responseis to be given. What we look forward
to in scienceis furtherdata,probablyof a somewhatdifferent kind,whichmayconfirm or elaborate
the conclusionswe have drawn; but perhapsof the same kind,whichmaythenbe added to what
we have already,to forman enlargedbasis for induction.
Neymanreinforceshis choice of language by argumentsmuch less defensible. He seems to
claim thatthe statement(a) "0 has a probabilityof 5 per cent.of exceedingT" is a different state-
mentfrom(b) "T has a probabilityof 5 per cent. of fallingshortof 0". Since languageis meant
to be used I believe it is essentialthat such statements,whetherexpressedin words or symbols,
should be recognizedas equivalent,even when 0 is a parameter,definedas an objectivecharacter
of the real world,enteringintothe specification of our hypotheticalpopulation,whilstT is directly
calculable fromthe observations. To preventthe kindof confusionthatNeymanhas introduced
we may point out thatboth statementsare statementsof the relationshipin whichT, or 0, stands
to the other. Also, since probabilityis specified,the statementshave meaningonly in relation
to a sufficientlywell-defined population of pairs of these values. The statementsdo not imply
thatin thispopulationof pairs of values eitherT or 0 is constant,but also theydo not excludethe
possibilitythat one should be constant,and that variabilityshould be confinedto the other.
Referenceto the mode of calculatingour limitsin an ordinarytestof significance will generally
establishthatin thesecalculationstheparameter0 has been treatedprovisionallyas constant,and
variationscalculatedof T forgiven0. The possiblevariationof 0 is leftarbitrary, and is irrelevant
to thecalculations,muchas is thedistribution of theindependentvariatein theregressionproblem.

This content downloaded from 158.121.247.60 on Tue, 06 Oct 2015 17:38:04 UTC
All use subject to JSTOR Terms and Conditions
1955] FISHER-StatisticalMethodsand ScientificInduction 75

A complementary doctrineof Neymanviolatingequally the principlesof deductivelogic is to


accept a generalsymbolicalstatementsuch as
Pr{(x- ts) < v< (x + ts)}=
as rigorouslydemonstrated,and yet,whennumericalvalues are available forthe statisticsx and s,
so thaton substitutionof theseand use of the 5 per cent. value of t, the statementwould read

Pr{92-99 < L < 93 Ol} = 95 per cent.,

to denyto thisnumericalstatementany validity. This evidentlyis to denythe syllogisticprocess


of makinga substitutionin the major premiseof termswhichthe minorpremiseestablishesas
equivalent. By this, which is surelya desperatemeasure,Neyman supportsthe assertionthat
if t standforsome objectiveconstantof nature,or propertyof thereal world,such as thedistance
of the sun, its probabilityof lyingbetweenany named numericallimitsis necessarilyeither0 or 1,
and we cannot know which,unless the true distance is known to us. The paradox is rather
childish,for it requiresthat we should wilfullymisinterpretthe probabilitystatementso as to
pretendthat the populationto whichit refersis not definedby our observationsand theirpreci-
sion, but is absolutelyindependentof them. As thisis certainlynot whatany astronomermeans,
and is not in accordance withthe originof the statementhe makes,it seems ratherlike an ack-
nowledgementof bankruptcyto pretendthat it is.
Finally let me add some notes on what appear to me to be distinctiverequirements of valid
inductiveinference.

5. Requirements of InductiveInferences
(a) Since some inductiveinferencesare expressedin termsof probability(fiducialprobability)
the firstrequirementis a clear understandingthat probabilitystatementsalways have reference
to some sufficiently definedpopulation,and neverto individuals,save as typicalmembersof such
a population. This understandingis needed for deductiveinferencesalso, when statementsof
probabilityare made.
(b) A veryimportantfeatureof inductiveinference, unknownin thefieldof deductiveinference,
is the framingof the hypothesisin termsof whichthe data are to be interpreted.This hypothesis
must fulfillseveralrequirements:(i) it must be in accordance withthe factsof natureas so far
known; (ii) it must specifythe frequencydistributionof all observationalfactsincludedin the
data, so that the data as a whole may be taken as a typicalsample; (iii) it mustincorporateas
parametersall constantsof naturewhichit is intendedto estimate,in additionpossiblyto special,
or ad hoc,parameters; (iv) it must not be contradicted,in any wayjudgedrelevant,by the data
in hand. If it satisfiesthese conditionsit is thereforea scientific
constructof a fairlyelaborate
type. It is by no meansobvious thatdifferent personsshould not put forwarddifferentsuccessful
hypotheses,among which the data can supply littleor no discrimination.The hypothesisis
sometimescalled a model,but I shouldsuggestthatthewordmodelshouldonlybe used foraspects
of the hypothesisbetweenwhich the data cannot discriminate. As an act of constructionthe
hypothesisis not altogetherimpersonal,forthe scientist'spersonalcapacityfortheorizingcomes
into it; moreover,the criteriaby whichit is approvedrequirea certainhonesty,or integrity, in
theirapplication.
(c) In one respectinductivereasoningis more strictthan is deductivereasoning,since in the
latterany item of the data may be ignored,and valid inferencesmay be drawn fromthe rest;
i.e. fromany selectedsub-setof the set of axioms used, whereasin inductiveinferencethe whole
of the data must be taken into account. This seems to be verydifficult to be understoodby
workerstrainedin deductivemethodsonly,thoughmore easilyunderstoodby statisticians. The
politicalprinciplethatanythingcan be proved by statisticsarises fromthe practiceof presenting
onlya selectedsub-setof the data available.
In some earlyresultsof my own I relyon the datum "There is no knowledgeof probabilities
a priori". They would not certainlyhave been legitimatewithoutthis datum, but they have
been mistakenlydescribedas a kind of greatestcommon factorof the inferenceswhich could
be drawnfordifferent possible data givingprobabilitiesa priori.

This content downloaded from 158.121.247.60 on Tue, 06 Oct 2015 17:38:04 UTC
All use subject to JSTOR Terms and Conditions
76 FIsHER-StatisticalMethodsand ScientificInduction [Part 1,

It is revealingthat this logical distinctionwas overlookedby Neyman and Pearson, in 1933,


in one of theirearliestpapers aftertheyhad learntof the possibilityof inferring fiduciallimits,
the argumentforwhichI had set out in a paper on inverseprobabilityin the Proceedingsof the
CambridgePhilosophicalSociety,1930. It is particularlyinstructive that althoughin thatpaper
I speak of "learningby experience",of "inductiveprocesses",and of "the probabilityof causes",
much as othershad done since the eighteenth century,theseauthorsread into mywork "rules of
behaviour",whichindeedI had not mentionedat all. Both misapprehensions become intelligible
if we realisethatthe authorshad no idea of a testof significance as a means of learning,but con-
ceived it onlyunderthe formof an acceptanceprocedure. The passage is as follows:
"In a recentpaper [(Neyman& Pearson; 1933b)]we have discussedcertaingeneralprinciples
underlyingthe determination of the most efficienttestsof statisticalhypotheses,but the method
of approach did not involve any detailed considerationof the question of a prioriprobability.
We propose now to considermore fullythe bearingof the earlierresultson this questionand in
particularto discuss what statementsof value to the statisticianin reachinghis finaljudgement
can be made froman analysisof observeddata, whichwould not be modifiedby any change in
the probabilitiesa priori. In dealing with the problem of statisticalestimation,R. A. Fisher
has shown how, under certainconditions,what may be describedas rules of behaviourcan be
employedwhich will lead to resultsindependentof these probabilities; in this connectionhe
has discussedthe importantconceptionof what he termsfiduciallimits.8'9. But the testingof
statisticalhypothesescannot be treatedas a problemin estimation,and it is necessaryto discuss
afreshin what sense testscan be employedwhichare independentof a priorilaws."
There seems here an entirelygenuineinabilityto conceive that when new data are added in
an inductiveproblem,previouslycorrectconclusions are no longer correct. Or, in this case
thatthe conclusionsproperto the absence of knowledgeof probabilitiesa prioriwould be wrong
for almost any set of such probabilities,and could in no sense be a commontermin the proper
inferences fromall such sets.
(d) Varietyof logical form.
A fourthfeaturewhichhas emergedin thestudyof inductiveinference is thatdata of apparently
the same logical form,thoughwith different mathematicalspecification,give rise to inferences
not always of the same logical form.
For example,when in 1930 1 introducedthe notionsof the fiducialdistributionand fiducial
limitsI did so withtheexampleof the samplingdistribution of theestimatedcorrelationcoefficient
r for various values of the true correlationp. The distributionof r is continuousbetweenthe
limits - 1 and + l, and forany value of P thereis a value of r, whichmay be called rp(p),such
that r exceeds it withfrequencyI -P, and falls shortof it withfrequencyP. These functions
of p increasemonotonicallyfrom -1 to + I as p passes from- 1 to + 1. Consequently,cor-
respondingwith any observedvalue r, thereis a value of p, which may be denotedas Pi - p(r)
such that forthis value of p the observedvalue will fall shortof r withfrequencyP and exceed
it withfrequency1 - P. In factifP is expressedas an explicitfunction

P= FN(r, p)
such thatthe distribution
of r forgiven p is givenby the frequencyelement
aF
aF dr,
then the distribution
aF
- dp

will be the fiducialdistributionof p forgivenr, in the sense thatthe frequencyof exceedingany


chosen value of p is the frequency,for that value of p, of r being less than the value observed.
The quantilesof thisdistribution thusgivethefiduciallimitsof p at any chosenlevelof significance.
Had I takena discontinuousvariate,such as the numberof successesobservedout of N trials,
and soughtin termsof the observationsto obtain a fiducialdistribution forthe trueprobability,
(say x), it would certainlyhave been possible to finda value of x such thatthe probabilityof the
numberof successesobserved,or any highernumberwas, let us say 5 per cent.,so that smaller

This content downloaded from 158.121.247.60 on Tue, 06 Oct 2015 17:38:04 UTC
All use subject to JSTOR Terms and Conditions
1955] FISHER-StatisticalMethodsand ScientificInduction 77

values of x could be rejectedat least at the 5 per cent. level of significance;but this gives only
an inequalitystatementforthe probabilitythatx is less than any givenvalue. Neymanseemsto
ignorethis distinction,and to speak in both cases of confidencelimits. Logically,however,the
formof inferenceadmissibleis totallydistinct.
Equally, statementsof fiducialprobabilityin continuouscases are only proper if the whole
of the informationis utilized,as it is by the use of sufficient estimates,whereasfor any test of
significance, howeverlow in power, it may well be possible to point to the limitsoutside which
parametricvalues are significantly contradictedby the data at a givenlevelof significance. These
also should be regardedas givingonlyroughstatementsforthe fiducialprobability.
Thereare othercases in thetheoryof estimationin whichrathersimilardata yieldinformation
of remarkablydifferent kinds. Consider,forexample,the case in whichx and y are two observ-
ables distributedin normal distributionswith unit variance in each case, and independently,
about hypotheticalmeans i and n. No situationcould be simpler. Suppose, however,thatthe
data containa functionalrelationship connectingi and -. Then different cases arisefromdifferent
functionalforms:
(i) If thereis a simplelinearconnectionbetweeni and -, so that(R,) represents a pointon a
givenstraightline,thenthe foot of the perpendicularfromthe observationpoint (x, y) is a suffi-
cientestimate,and the fiducialdistribution of (g, -) on the givenline will be a normaldistribution
with unit variance about this estimate. All possible observationson the same perpendicular
are equivalent.
(ii) If the givenlocus of (R, ) is a circle,thereis no sufficient estimate; the distanceof (x, y)
fromthecentreof the givencircleis, however,an ancillarystatistic,whichtogetherwiththe maxi-
mum likelihoodestimatemakes the estimationexhaustive. For each possible distancean appro-
priatelyorientedfiducialdistribution on the circlemaybe specified.
(iii) In generalthereis a well definedlikelihoodfunction,and thereforean estimatedpoint
of maximumlikelihood. It is not obviousthatanygeneralsubstitute can be foundfortheancillary
statistic,save in an asymptoticsense, or that any statementof fiducialprobabilityis possible in
general. Thus threelogicallydistincttypesof inferencearise fromsimplechangesin the mathe-
maticalspecification of the problem.
(e) Finally,in inductiveinferencewe introduceno cost functionsfor faultyjudgements,for
it is recognizedin scientificresearchthat the attainmentof, or failureto attain to, a particular
scientific advance thisyear ratherthan later,has consequences,both to the researchprogramme,
and to advantageousapplicationsof scientificknowledge,which cannot be foreseen. In fact,
scientificresearchis not geared to maximizethe profitsof any particularorganization,but is
ratheran attemptto improvepublic knowledgeundertakenas an act of faithto the effectthat,
as more becomesknown,or moresurelyknown,the intelligent pursuitof a greatvarietyof aims,
by a greatvarietyof men,and groupsof men,willbe facilitated. We make no attemptto evaluate
theseconsequences,and do not assumethattheyare capable of evaluationin any sortof currency.
When decision is needed it is the businessof inductiveinferenceto evaluate the natureand
extentof the uncertainty withwhichthe decision is encumbered. Decision itselfmustproperly
be referredto a set of motives,the strengthor weaknessof whichshould have had no influence
whateveron any estimateof probability. We aim, in fact,at methodsof inferencewhichshould
be equallyconvincingto all rationalminds,irrespective of any intentionstheymayhave in utilizing
the knowledgeinferred.
We have the duty of formulating, of summarising,and of communicatingour conclusions,
in intelligibleform,in recognitionof the rightof otherfreemindsto utilizethemin makingtheir
own decisions.

References
BARNARD, G. A. (1945), "A newtestfor2 x 2 tables",Nature, 156,No. 3954, 177.
- (1946), "Sequential tests in industrialstatistics",J.R. Statist. Soc., Supp . 8, 1-21.
---(1947a), "Significance
testsfor2 x 2 tables",Biometrika,
34, 123-138.
(1947b),"The meaningofa significance
level",Biometrika,
34, 179-182.
-- (1947c), Review: Sequential Analysis. By Abraham Wald, J. Amer. Stat. Ass., 42, 658.
-- (1949), "Statistical inference",J. R. Statist. Soc., B, 11, 115-139.
COCHRAN, W. G., & Cox, G. M. (1950), ExperimentalDesigns. New York: Wiley. London: Chapman
& Hall.

This content downloaded from 158.121.247.60 on Tue, 06 Oct 2015 17:38:04 UTC
All use subject to JSTOR Terms and Conditions
78 FISHER-StatisticalMethods and ScientificInduction [Part 1,

DAVIES,0. L. (ed.) (1954), The Designand Analysisof Industrial Experiments.London & Edinburglh:
Oliver& Boyd.
FINNEY, D. J.(1952),StatisticalMethodin BiologicalAssay. London: Griffin.
FISHER,R. A. (1922),"The goodnessoffitof regression formulae, and thedistributionofregressioncoeffi-
cients",J.R.Statist.Soc., 85, 597-612.
(1930),"Inverseprobability", Proc. Camb.Phil.Soc., 26, 528-535.
(1933),"The conceptsofinverse probability
offiducialprobability to unknown
referring parameters",
Proc.Roy.Soc., A, 139,343-348.
(1934), StatisticalMethodsfor ResearchWorkers.(5th ed. and later.) London & Edinburgh:
Oliver& Boyd.
(1941),"The interpretation of experimental fourfoldtables",Science,94, No. 2435,210-211.
(1945),"A newtestfor2 x 2 tables",Nature,156,No. 3961,388.
GOULDEN, C. H. (1939and 1952),MethodsofStatistical Analysis. New York: Wiley. London: Chap-
man & Hall.
NEYMAN, J.(1938),"L'estimation statistique
traitecommeunprobleme classiquede probabilite",
Actualites
et Industrielles,
Scientifiques No. 739,25-57.
& PEARSON, E. S. (1933a),"The testingof statistical
hypotheses in relationto probabilities
a priori",
Proc.Camb.Phil.Soc., 29, 492-510.
(1933b),"On the problemof the mostefficient testsof statisticalhypotheses",Phil. Trans.Roy.
Soc., A, 231, 289-337.
PEARSON, E. S. (1947),"The choiceof statistical
testsillustrated
on theinterpretationof data classedin a
2 x 2 table",Biometrika,34, 139-167.
"STUDENT" (1908),The probableerrorof a mean,Biometrika, 6, 1-25.
VENN,J. A. 1876),TheLogicof Chance(2nded.). London: Macmillan.
WALD,A. (1950),Statistical DecisionFunctions.New York: Wiley. London: Chapman& Hall.
WILSON, E. B. (1941),"The controlled
experiment andthefourfold table",Science,93, No. 2424,557-560.
(1942a),"On contingency tables",Proc.Nat. Acad.Sci., 28, No. 3, 94-100.
WORCESTER, J. (1942b),"Contingencytables",Proc.Nat.Acad.Sci.,28, No. 9, 378-384.
YATES,F. (1949),Sampling Methods forCensusesand Surveys.London: Griffin.

This content downloaded from 158.121.247.60 on Tue, 06 Oct 2015 17:38:04 UTC
All use subject to JSTOR Terms and Conditions

Das könnte Ihnen auch gefallen