Sie sind auf Seite 1von 10

ulalalllualaHnaaallllull.UII.lmlD!I •••• lIg •• ~.!I •• I.I'II' UIIIIIIDlllaUUla . . .

"

PART ONE

Descriptive Statistics

1IIIalllluIIIDII ••• I.I . . . . . II ••••• ~IIII . . . I. . . lIl1 •• ID •• U .UI ••••• I1 . . . . . IRI"II.III •• I1 •• "' ..
CHAPTER 1

Introduction to Statistics

"
A. HISTORICAL BACKGROUND *
Statistics is a form of applied maLlwmatics. IL is a logical Lool used in aU the
sciellces and elllploy(~d b~' ulll1l(J(lern cultures. It is especially a tool of the
biological and social sciellces, a tool whose developllHll1t has paralleled the
practical demands of mall's lleeds ill a diverse and complex world.

Gamblers and Kings


Statistics had its beginnillg's JIlany gCIlPrat.iom; ago as a result of the inter~
est::; and needs of gamblers and kings. The gamblers wished to develop systems
that would improve their skill aL cards and dice. Rings wished to know more
about their suhj(~cts so as La work out more efficient Laxing systems. Out of
111<' iuterests and llPeds of gamblers carne tht~ fOllnciation of 0111' modern
theory of probability, a theory basic to samplillg statistics. Ou t of the intercs Ls
and needs of kings mnel'ged viLal amI social statistics, statistics as a descriptive
method for enumeratillg and classifying hundreds and thousands of classes
() f useful cia La.
The Mathematicians
After the gamblers and kings came the mathematicians. In 1657 there
appeared a brief treatmcnt by ChrisLian Huygens, the great Dutch mathe~
matician and physicist, of the chances of winning at certain card and dice
games. Tlll'pc years earlier Pascal and FermaL had had their famous corre-
spondence, in whir:h they established t1w fundamental principles of proba-
bility. A little later Jacques Bcl'110 (llli , the Swiss mathematician, Wl'ote the
first hook OIl the subject of probahility. It was published in 1713, after his
death, by his nephew, Nicolas Bernoulli. The work is an historical landmark,
('specially because of its emphasis on the practical value of the theory of'
prohalJiliLY for.I-lncial problems. But Jacques Bernoulli's untimely death Cll!,
short. tho imnlPdiate development of Jllany pracLical possibilities of statistics
in social affairs. Such devriopnwnL waited another century, until the work of
the Bdgian, Adolphe Quelelet.

* cr. H. M. Walker, Studies in ihe IIisiory of8talislical Method, Williams & Wilkins,
Baltimore, 1929.
3
4 INTRODUCTION TO STATISTICS

In the meantime, the Lheoretical development of statistics centered abol1t


the concept of probability initiated, as we have indicated, by I)(lScal and
Fermat, and .Jacques Bernoulli. Tn I n:~, de Moine gave the first mathemati-
cal formulation of the lIormal probability curve (the curve of (~JTor), but liLtle
attention was paid to it at the tinw. De Moivre attempted to remove the
stigma of gamhling from the prohlem of probability and to give tlw t1wory a
divine flavor by lllainiaillillg: .. Alld thus ill all cases iL will he round, thaL
although chance produces irregularit.ies, still the Odds will he infillildy grnal,
that. ill process of Time, those Irregularities will bear no proportion to Lhe
reeurreucy of that Oreim' which naturally I'(~slllts frOlll Ol'igillaJ DpsigJl." :I<
It was !loL, however, until toward the end of the eig}Jtemlth cmrtury and
the bCbriuning of the nineteenth that the theoretical developmcn t of statistics
got under way as a hroad and continuous enterprise. It was with the work
of the great European mathematicians, Laplace .alld Gauss, and of the physi-
cists and asLronomers that the scientific foundations were laid for lhe theory
of probability aud the measurement of errors of ohservation. GalIss, "the
Prince of Mathematicians," '/ was especially cOllcerned with Ihe practical as
well as the Lheoretical problems of astronomical measllrement, alJ(I the Tlormal
curve of errol' was developed for the variable results of observatioll wi Ih the
mean of a series of observed values taken as the most probahle value of the
measure sought.
III t his work of the mathematical astronomers it, is evident lhat theorelical
statistics was developing in conjunction with SOUle empirical pJ'oblerm; of
measurement. However, the broad foundaLions of dcscl'ipii've statistics (whieh
Quetelet lat~r integrated with Lho theoreLicaJ) for the study of social phe-
nomena were estahlished by government officials and political economists.

The Census-Vital Statistics


We have seen that kings had long beeu illterested in enumerating Lhose
of their suhjects who could pay taxes. They had also long been interested in
the number of subjects who could render military service. The registering
of baptisms, marriages, and deaths was begun in a few places in Europe d uri IIg
the fourteenth and fifteenth centuries. Such data formed the basis for the
beginnings of descriptive statistics, and by the scventecl1th century ceusus
taking had its systematic start. According to Godfrey,t the first census of
modern times to be conducted under that name was taken in Canada in 1666.
The data reported filled 154 pages and included facts about the population
such as sex, family and conjugal status, age, profession, and trade. More

* Cf. H. M. Walker, Slrzdies in the Hislory of Stali.~lical Melhod, Williams & Wilkins,
Baltimore, 1929, p. 17.
t Cf. E. P. Bell, Men of Mathematics, Simon & Schuster, New York, 19:17, dUlp. '11,.
t E. H. Godfrey, Section on Canada, in John Koren (C(l.), The Hislory oj cr.,'IIlIi.l'lic.9:
Their Development and Progress in Many Counlries, Macmillan, New York, 1913, pp. 179-
198.
HISTORICAL BACKGROUND 5

recently, however, Dr. Carlos Castefiada, Latin-American authority at the


University of Texas, has reported that the first census on the North American
continent was conducted by the alcaldias maYOT'es of New Spain between 1570
and 1580 at the eorllmulld of King Philip II of Spain. * Philip wanted to
know how many people thme woro, the family income, lYlembers pel' family,
the amounL of Laxes they paid, and on what allCi with wlmt they paid their
taxes. Altogether there were 150 qnestioJls for each family lo answer.
The end of Lhe sevenLe(~lllh century saw [he publicatioll of mortality tables
by the Ellglish astronomer, Halley, in Hi9:L Annuity Lables for insurance
societies made a marked empirical dcvdopment ill the eighteenth century
because of the vital sLalisties which had been colkcLed by that Lime. The
revolutions in America and France further stimulated the interest. in data
abouL the masses of populatioll. Our Articles of Confederation provided for
a triennial cenSIlS, but this was changed to a decellnial basis when the COll-
stitution was adopLed; and] 790 saw the first official census of Lhe newly
formed UniLed States of America.

Adolphe Quetelet {1796-1874}-Social Scientist


It was QU('tP1eL who dcve}oppcl statistical nwl]Jod as a scientific research
tool in Lhe sludy of Illall and the social ticicnc('s. QUeldet was a university
teaelwr, llIaI1lC'rnatieiull, astronomer, and an Lhropometl'isL, as well as his
country's snpcrviHor of official statistics and lWllce respontiihle for the first
wl.lion-widp ('enSllS. It was QuetdeL who brought together the them'etical and
empirical foundationti of staListics, ill tcgl'ating GIld develophlg Lhem for the
investigation of sucial phenomena. He combined a mathematical interest in
Llw theory of probahility with a passion for the collection of data about
people. Time and again, dlll'ing the nineteenth century, lw emphasized that
the basic techlliques of statistical method are the same whether we arc study-
ing the slars or man, the weather or morals. It was Quetclet who developed
the concept of the ureT'age man-l'homme moyen-insisting that in the sphere
of human activilies all is not individual and unmeasurable. In 1831 he re-
ported a study on tendencies to crime at different ages, in which he analyzed
the role of sueh factors as sex, educat.ion, alld climate 011 criminal tendency.
Just as we arc often startled by predictions about the number of deaths from
accidents to bc expected on the Fourth of July or for a given pcriod from
automobile traffic, so Quetelet was imprcsl'led by the relative cOMtancy of the
number of crimes from year La year: "Thus we pass from.ono year to another
with the sad perspective of seeing the same crimes reproduced in the same
order and calling down the same punishments in the same proportions. Sad
condition of Humanity! . . . We might enumerate in advance how many
indiv~duals will stain their hands in the blood of their fellows, how Illany will

* c. D. CU~I.efll,ldu, in the New York Ilerald Tribune, July 7, 1940; also direct corrc-
tipoll<kmce.
6 INTRODUCTION TO STATISTICS

be poisoners; almost we can et1Ulllerate in ad vallCe the hirt.hs and deaths


that should oc('ur. Th~J'(~ iii a budw,t which wc pay with fl'ightfull'cguiurity;
iL is that of prisolls, chains ami the scaJfold." *
Qudelct was criticized as a matmialist by many of his cOllternporal'ies be-
cansl' he dared to suggesl lhal Ihe ltIoral worlh of a Hlall JIIight he illfcrl'cd
from measurenwnts of his actions, that tile intl'lkeLual vitalily of a mall
might be deduced from whal he produced. He was confident Lhat, the mental
and moral traits of mall could he measured ami llial, Whetl Ul(:asnred, the
distributions of such Lraits would be shown t.o conform to the so-called normal
law. The llormal pl'Clbability curve, which is illuslraLed in Fig. 1:1, came
practically to be deified-and 110 wOlldpl'. As large samples of data of various
characters of men, of biologieal and .soeial plHHlOlllcna, CaTtle to be Ineasured,
the disLribuliom; were often fOLlnd to appl'oaeh Ihe fOI'llI of tilitl curve.

Sir Francis Galton (1822-191l}-Geneticist


Aftcr QUeLdet, but contemporary ,vith him as a statisticiall for n gelwra-
lioll, carn() Sir FrallciH Galloll. Like QlIctelcl, Galtoll also madp extensive
usc of the normal probability C\ll'V(~ ill Ihe descripLioll of biological awl sorial
phenomena. Lil,c Qllelelel, Gallon saw ill sLatiotical method the meLlIlS of
discoVP]'ing' regularity wid lawfulness ill pllenonwlIH wllicll oLlwrwisn, by
their diversity and eompiexiLy, seemed individual amI unique. Galton sug-
gested the use of the nOI'l)ml Clll'V() in the assigllillg of gl'aups, or class lllUrks,
ill the 8choolroolll. Like QucLnieL, Galton had H gl'ca L pas8ioll for obsel'val.ioll,
for recording data aud anaIY7.ing tlwm by the methods of statistics, many
of which he himself developed as Ilw IIc('(1 arose . .It was GaHon who dis-
covered the method of statistical correlation, a discovery made in connection
with the need for analysis in his sludies of the illjlcrilallce of [.raitH. rt is II!J
exaggeration to deseribe this discovery of Galton's as OlW of the greatest
contributions ever made III Lhe empirical devdopmellL of t.hc biological and
social sciellces.

Correlation
The lleed for Lhe technique of correlatioll is alltly illustraLed by some of
Bowditch's problems which he was unable Lo answer adequately, as he him-
self recognized. WiLh the object of improving school application in growing'
children, the Massachusetts Board of Health sponsored the study by Bow-
ditch, reported in 1877.", Descriptive statistics of nearly 25,000 children were
obtained, including not only bodily measurements alld age, but also uation-
ality, place of birth, and occupation of parents. BowdiLch wislH~d to analyze

_, * F'.l~. HunkinR, .~1d()iJllli' Queldel as Statistician, Culumbia Univcr~ity St.udies in lIi~t.ory, ,


beonollllcH and Puhlw Luw, No. I\,J" New York. 1908.
t A. 1'. 13owdil.eh. "Tlw (,rc;wlh of Children," Report. of the BOlIn] of JIeull.h of _I\IUSHl\-
dllll:ietl.s, IBn; wprinled in Bowditdl'H I'aper.\· IJll illllhropOFlll'il'Y, BOHt.OIl, lllC) I.
HISTORICAL BACKGROUND 7
this tremendous mass of data for such relations as might be relevant to the
original problems of the inquiry. He wanted to know, for example, what rela-
lioll there was between the height and weight of the children. He saw that
there was a relationship, but the technique of correlation was not yet avail-
able with which to formulate a determinate answer regarding the degree or
character of the relationship.
That there is an empirical basis for a possible relationship is obvious since
Iwight and weight are both aUributes or traiLs of individuals. The real ques-
tion, however, does not relate to the individual case. It is an actuarial or

Fig. 1:1. The Normal Probability Curve

The ahsdsHH (hnrizulltai) axis rp[JI'psenls th(~ sm"~ of llleHSlu'es 0[' t;cores of a
variable attributo or trait.. TIll' ordinate (VI~rt.iea!) axis repmsmlls tho .freqtlCllcie.~
of the distribut.ioJl. The higlH~r t.lw eurve at any [loin I, t.lw gnmtm' t.he nllmher of
frequencies Or illst.arH:m; for t.lw measures at that. point .. The point of greatcHt con-
centrution of frequencies i~ in the center of the dist.ribul.ioll, at. 111, the mean.

{lroup question. Wn kllow lhat some people arc likely to be Lall, some short,
some heavy, some light. Persons, even of the same age, thus vary ill height
and weight. Height and wcigllt arc therefore called ml'iables or variates.
Quetelet and oLhers established the fact that thoro is a very real tendency
for a large random samplf~ of persons of a given age to have weights or heights
which, when systematically organized into a series according to size, form a
distribution· which is similar to that of the normal probability curve (see
Fig. 1:1).
The question of relationship between two variable attributes like height
and weight is whether individuals of average height are also of average weight;
whether very tall individuals arc also very heavy; whether very short in-
8 INTRODUCTION TO STATISTICS

dividllalH arc also vl~ry light. til other wurds, llw question is whclher weights
and heights, wlwlI paired accurding I.u Uw pasolls from WhOlll Uwy arc
obtained, vary l.ogellwr ill ally systematic way. Thifl is the probkul of co-
1'Ul'ialiol! or correia/ion, allfl it is complicated by the faeL thaI. rarely, if ever,
do the measured attl'ibuleR of biological or :o;(lcial phcnollwna exhibit perfect
or complete cOlTdal.ioll. 1'l'rsoll:-; w\to wt'igh, say, .160 pOLlnds do lIot all have
the sallle lwighl; ratiter, Lhey vary ill height. Similarly, lwrs()us of a given
height, say 6 f(~('t" vary ill wt'igill. The f.;latistical pr'olJlertl iJern is olle or
determining' 11](' /01'111 {lnd Ut'(JI'I>l' of allY L(~lldell(,y for wdght aud hcig'l!L lu
vary together. The detaib of llll' tlLatisLicaltl'l'hlliquc' of COI'I'P1alioli dl·lnandl'.d
by this kil\d of problenl, tile prohleIll 01' pu~:;i]Jll~ ('(J-nll'iuIjulI, wi.ll b(~ t:OlI-
sidercd ill ChapL!')' 9. 11el'u we wish ollly to ClIlphasizl' that Lite technique dis-
covcred by Galloll hm; [well indispensable to t.he Illoderll devclopllIclit of the
biulogical and soeial scietlc!!,;.
It is hy the slatistical tecillliqlw of correlation Ibat we are toclay able hy
cOlllparatively simple meth(lds 1.0 illve~tjgate rdul,ions huLwecn t.lw aLtrilmt,es
of individuals, or of orgallisllls gPlluraily, as wcll us relations l)(:Lwecll tlw
attribuLes of other killds of natural and so('ial phen()mena. "VitaL is Lhe llat.un:
of the relaLion, if any, bctwI'pn tlw I.Q.'s and sc:hoolgradl's of childl'l:ll, }w-
tween the tes1erl achinvPlllents of paren ts ami their o(fsprillg, lwLwel~1l t.he
Jllanual ahililies of sihlillgs? Is tlwl'l~ allY relalion bl'(.wcell temperaLure and
plant growth, betwcen neighborhood sLatus alld delillquency, hetween ll]('
prot<~ill cOllletrt and proportion of vitreous kernels ill wheat grains? Although
methods of illvcsLigaLillg' su('h questiolls [lH these [lJ'p 80111dinwfl complicatl'd,
the method of correlatio]} itself remains a most. powerful tool for the study
of possible relaLiom; among the variable att.l'ihuLcs of natural and social
phenomena. It is again 10 be emphasized, llDwcVl'r, Ihat this mctho(l, a::; W(~1l
as statistics gClll'Tally, if.; for the st.udy of group phenomcna-of masses of
instances. Infcrences which can be made legitimately from HiaListical results
arc about the group, 1I0t about the individual instancf). Descriptively, such
results give us informatioll about the group as a whole. Analylitally, slIch
results may ofLell be llsed for prcdieting whaL !lIay happen in fhe lona 1'lln or
on lhe average, but llot in the individual case.

Statistical Prediction Actuarial, Not Individual


We say that the chances are even Lhat a tossed coin will land heads or tails.
We mean that in lhe long run a series of sllch Losses should give .!JaH· heads
and half Lails. What happens in the giV(~ll, indiviclual toStl is strietly de-
termined, although we are unable Lo ascertain the determining conditions
so as to predict which side of the eoiu will Jie uppermost. Our ignoronco of
the many fadors operating in the determinaLion of the result is such, and
our knowledge of what happens in the long run for a fair coin is such, that
we say the chances are cveIl, or fifty-fifty, that the coin will land heads or
tails. This is thus a verbal, somewhat metaphorical exprcssion of our ignoranco
abouL what will happml in the individual iJlstUlleo. Similarly, we Hay thaL 1he
DESCRIPTIVE VS. SAMPLING STATISTICS 9

chances arc aboll t. l~vell that a child to be born will be a boy or a girl. Again,
the metaphor is based (1) on OUl' ignmallce of the deLermining factors in the
given, individual instancc, alld (2) on the empirical facts of vital statistics
which have revealed for thousaIlds of births that the ratio of' boys to girls
is about 51 La li9.
These two examples should serve to illustrate the acLuarial or group cbarac-
LeI' of statistics. What is true for the proporLion of heads alld tails in coin
tussillg, alld of the sex ratio of hirths in viLal statistics, is also true for aU
statistical iuference, in that pl'cuicLions are actuarial and not individual. It
is well established ill psychological and educational measurement, for (~xample,
that there exists a real cormlative relationship between the academic attain-
rncllts and intelligence test aellifwement of' the school population ill our
culture. Given a particular I.Q. score, say 70, obtained ullder optimnm con-
ditions of measurement, we can predict that school children with such an
l.Q. will, on lhe average, be below average ill Lhl'ir acad(~rnie attainments.
That this is an actuarial or group iufereJlce should bl~ obvious; neveltheless,
such a predictioll is SOlIwtirnes made for Lile individual child who is, after
all, either below average or not, ill bis acadelllie attainments. And what he
will couLinue to do in his school wurk call be efi'ccLivt:iy uJld logically pre-
dicted with cOllHdenee only m; a rcsnlL of sLndyillg him as the psychological
individual Ihat he is. In dealillg wilh tIle individual ehild, Hw psychologist
finds it useflll and valid 10 draw upon his fund of statistical or acl.ual'ial
experiellce amI information so long as 1m continues to fnclls his analytical
attention on the uuiquc totality of the particular child.*
That a child has an IQ. of 70 is useful iJlformation so far as t.he psycholo-
gist determines as precisely as possible what the actual illidligence test
performance means for that particular child. In facl, Llw competent. psy-
chowgical iuvestigator uses an intelligence lcst chiefly for such a purpose,
for tbe light which the child's performance may throw on his total personality.
111 individual diagnosis and prognosis, the calculation of Llle I.Q. score itself
is incidental La Lhis fundamental purpose.
We see, then, that the data and methods of sLat,isties are for the study of
group or mass phenomena. And statistical illferences are actuarial in charac-
leI', i.e., they are inferenees about what happens or may happen in the long
run, or 011 the average.

B. DESCRIPTIVE VS. SAMPLING STATISTICS

The Concept "Statistics" -Its Various Meanings


We have been using the term slatistics mainly to refer to a melliod. This
is because we are primarily concerned in this hook with statistics as a scien-
Lific method of descl'iption and analysis. However, it is well t~ note that the
"Cf. in this regard tho comments on prediction by U. W. Allport in his presidential
addreH" to Lhe A.meriean PHyehologieal Association, 1939: ,. Tho Psychologist's Frame
of Hefercncc," Psyclw[()!}ical nu.lleLin, 37:1-2B, 19!10, especially pp. 16-111.
10 INTRODUCTION TO STATISTICS

word statistics is also used to denote the dala or information about popula-
tions, about biological and social phenoIllena, tbat call be measured or enu-
merated. Allhough this lauer use of the term has been suggested ill the
preceding pag'(~s, we wish specifically to differentiale statistics as informa-
tion from slatistics as melhod.
Statistics as information r(,pJ'(~Sellts perhaps the most gelleral use of the
concept. Today tlwl'e al'l~ Iii erally thousands of publicatiolls presenting
statistical information of various kinds: vital statistics, statistics of health
and medical care, slalistics of ('ducation, of social security and of labor,
statistics of eriml', slatistics of governmcllLal llnancl\ statistics of agl'icultul'l~,
mallufactures, lllillcmls, of housing and building constructioll, of wholesalu
and retail tradl', of pllblic utilities, of mOllcy and banking, of security markets
aud corporations, stalistics of interuaLioIlal tradc, uf bllSillcsi:) activity, of
commodity prices, of cOllSllmpticlll, alld of natiollal illcome alld wealth. Al-
though we are 110t directly (~OlICerned wiLh sl.atistics ai:) illformatioll, I.he
student of IJsychology, alllhropology, sociology, or educatioll should he
familial' wiLh sources of sLatistical informatioll relevant to his field of !·esl'al'c11.
A short bibliography of source material is appended to serve this purpose
(see Appendix A).

Stalislic /.'S. Parameter Values


AnoLher distillctioll ill Lhe use of t.he concept statistics arises in tlw study
and allal;rsis of populatiol1s by sampling rncllI Olls. AllY SUlIunary llllItWrical
values obtained from samples of data, such as measures of an average, of
deviational tendeney, of correlation, eLc., an~ characterized as statistics. Such
statistics are contrasted with parameter values, which are these same t.ypeH
of measures but ar(~ for a statistical population as a whole, rather Lhan for
only a sample of the population.

Statistical ,Ivlelhod vs. SlaUslical I nfcrence


A distinction is also sometimes made between statistical method aod
statistical inference. This is, however, a somewhat ambiguous and UfIlHlcm,-
sary distinction, since statistical inference is integral to :statbl.lcs as a method..

Description vs. Sampling


A more useful distinction can be made wit.h respect to the nature of statis-
tical method itself, viz., the methods of descriptive stalistics, on the one hand,
and those of analytical or sampling statistics, 011 the other. Since this dis-
tinction has fundamental and important implications for statistical method
in scientific research, and since this book is organized on the basis of Lhe
contrast. we shall describe the difference between descriptive and sampling
statistics at tlifs.point and see more fully the implications of the distinction
as we proceed.

Das könnte Ihnen auch gefallen