Sie sind auf Seite 1von 13

Plots of High-Dimensional Data

Author(s): D. F. Andrews
Source: Biometrics, Vol. 28, No. 1, Special Multivariate Issue (Mar., 1972), pp. 125-136
Published by: International Biometric Society
Stable URL: http://www.jstor.org/stable/2528964
Accessed: 04/09/2010 00:05

Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at
http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless
you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you
may use content in the JSTOR archive only for your personal, non-commercial use.

Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at
http://www.jstor.org/action/showPublisher?publisherCode=ibs.

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed
page of such transmission.

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.

International Biometric Society is collaborating with JSTOR to digitize, preserve and extend access to
Biometrics.

http://www.jstor.org
28, 125-36
BIOMETRICS
March1972

PLOTS OF HIGH-DIMENSIONAL DATA

D. F. ANDREWS1

Bell Telephone
Laboratories,
MurrayHill, NewJersey,
and PrincetonUniversity

SUMMARY
A methodofplottingdata ofmorethantwodimensions is proposed.Each data point,
x = (xi, *, xk),is mapped into a functionof the form

fx(t) = xl/ v/2 + x2 sint + x3 cos t + X4sin2t + x5cos2t + ,


and the functionis plotted on the range - 7r < t < 7r.
Somestatistical ofthemethodareexplored.The applicationofthemethodis
properties
withan examplefromanthropology.
illustrated

1. INTRODUCTION
Plottinghas longbeen one of the mostusefultoolsin the analysisof
data. Muchof modelbuildingis aided by the plottingof residuals.Distri-
butionalassumptionsarefrequently basedon probabilityplots.
Unfortunately, most graphicalmethodsare restrictedto displaying
univariateor bivariatedata. Some progresshas been made in the use of
differentsymbolson a two-dimensional plotto givesomeidea of a thirdor
fourthdimension, but thesemethodslack precisionand seemlimitedto a
smallnumber ofdimensions.
However,one is accustomedto examiningplots of functions,and
theymaybe infinite-dimensional.Thissuggests imbedding highdimensional
data in a higherdimensional but easilyvisualizedspace of functions,
and
thenplottingthe functions.One way of doingthisis describedhere,and
someexamplesaregiven.

2. THE PLOTTING PROCEDURE


The proposedplottingprocedure is verysimple.If the data is k-dimen-
sionaleachpointx' = (xl, ***, X*) definesa function
fx(t)= xi/dV+ X2sint + x3 cos t + x4sin2t + xS cos 2t +
This functionis thenplottedoverthe range- 7r < t < 7r.A set ofpoints
willappearas a set oflinesdrawnacrosstheplot.The programming effort
Presentaddress: Departmentof Mathematics,Universityof Toronto,Toronto 181, Canada.

125
126 BIOMETRICS, MARCH 1972

required for this plot is small, but an output device with relatively
highprecisionsuch as a CALCOMP or microfilm
plotteris required.

3. A BIOLOGICAL EXAMPLE
Ashtonet al. [1957]used graphicaltechniqueswhen comparingmeasure-
mentson the teethof fossilswiththose of different 'races' of men and apes.
From the data, groupmeans werecalculatedformen and apes and canonical
variables determinedto maximizethe between sum of squares relative to
the withinsum of squares. The firsttwo such variateshad the largesteigen-
values and these variables were used in the plottingprocedure.The group
means were plotted and 90% confidencecontourswere drawn about them.
The fossilvalues were then plotted and assessed relative to the different
groups.
Ashton et al. analyzed several teeth. Here we presentthe data for one
tooth-the permanentfirstlowerpremolar(Table 1). In Figure 1 the group
means are plotted togetherwith approximate 90% confidencecontours.
The fossil measurementsare also plotted. From this plot those authors
concludedthat Proconsulafricanusis verylike a chimpanzeewhilethe other
fossilsare morelike humans.
Some group means have some moderatelylarge values of the thirdand
fourthcanonical variates,while large values occur forall 8 variates forthe
fossils.Plottingall the variatespermitsthe examinationof the effectof these
largevalues.
In Figure 2 the group means of all the canonical variables have been
plottedas functions.From this plot it can be seen that the humans and the
apes formfairlydistinctgroups. Among the apes, the chimpanzeesstand
out fromthe gorillasand orang-outangs. Certainvalues oft seem ofparticular
interest:at both t2and t4the functionvalues forhumanshave a fairlyprecise
value which is much different fromthe correspondingvalues for any ape.
The point t1has an analogous propertyforchimpanzees,gorillas,and orang-
outangs.The point t3 correspondsto the widestseparationamong the group
means.
In Figure3 the fossilshave been added. ImmediatelyProconsulafricanus
stands out as quite different fromany group.For some values of t it is very
like a chimpanzee,gorilla, orang-outang,or man and has certain char-
acteristicsin commonwith all of these! The remainingfossilsshare with
man the characteristichuman quality exhibitedby t.2and t4. However,for
some othervalues of t they are quite different frommen and are more like
apes, particularlychimpanzees.This is not apparent fromFigure 1. Fortu-
nately it is possible to calibrate these observationswith significancetests
and confidence- sets.
In sections4 and 5, the variance of a functionvalue var [f(t)]is derived
and is shown to be almost independentof t. The standard deviation a! is
shown on the graph, as is the width of a 90% confidenceband for f,(t0).
Applyingthese at the values t2or toit can be seen that
PLOTS OF HIGH-DIMENSIONAL DATA 127

O nc\ LnO LI 04
all tloMLfnNL0rncr
nf O n Ln t O LQOOQO
O O r- O O O LnNf CO H --I

HC H ++ + + + I 1+I +
++ I+ II++

C1 -:I H ~HO Mmm ooIt o H-


40 Co t0 -co D C'JOOOOOC\J O0 0

D C; LA CM LfAL- 0 S- O; O~O O; O; O; O 0, O0, 0 N; C


H N
+CM 1
H + I I +I+I+ I + +++I +
+ I I I + I I --

LI --4 \ CQ C) LnH4 -:- N 4 O ) 0


O (D (< rd]O JON
CflC C~ C'; 1; (;
n 00~0O~MCjn0 0 0O40000000
C; O~ HHOO C C; CH

H H I+II++II+ ++III +
+11 l ++I 1+

Co t- C)-\0 LI- CY)0\4 0 C(n0 H 'I


1 I 0 0 rHI t- Co Lf 0\ 0 M) N HCJ HOH O N
\W\ CO4 0

' D0 CU -0OLr\ 000000000O OO H HCM HH CM


HH r H I + I + III+ IwII +
I I ++++ I I +

n
H4~~~-t'-C~~~M L'CC)Ct0000COCM HLC\H00 t-

c 0- 4Hc0tCt t04 2 000


0000 OJ O OH MN O OO H
I
0 I HHC'J++t+I+I HH 0L + III++ + ++++I
O 4r + ++ I +I I H 4

0Ht-Hcc^a\0&' H4CflUt-t~-0 HHo


H00CH
0
0 000 0 HOO v2 M
3'gCM~000
o0 O H0O Co
H4H +l|+++ I I ++ 0 +++++ I
&5 0;N| +x q
b ?' S~~~~~~CSEC
Cht O~qL (E ro m r4 4 c

z *R + + I + + + + +n'_:I- + I + ++ I + + 0 + + + + +I

oH+ + t H +++++f +
O He LQ~ CC) (1 3C0 40C3 LCMHC<) - LAtCC)C HD
Hf .tH n lIt+++

0 N~~~~~~~~~~~~~

LI 010 U-U * ,Qt (O nk)


CU (y \D 0 (o CHj
H~ CM ~ H ~ (C I ++ + +i L)I

EA : 0 H
Q H U
OS ) Os :s020
zH ( D zt
0 CY')0
CY) t

U O~
H~~~~~. H wOHM
-q OH C0 2 0 020ON ?Q
~ U]***O
~ Hp*+OO~ ~ ~ ~
Co t-- LI CO
Pctlcc *
UQ- *0) b00d *0MD2
+S o D S0

H C* U (t ) V* H
C IX O* O U
n X ?0
@@@C UHjH I) C ? S0 - pt rc
AU 2QU 0
U OC
ItIHI )O I ls < < t
P0S4-' N 4 V
pSUU U a*-.X3
,cU
Co r-
Ur.H) t$42 S .u
r-i b.0 Ei *H Q
Os rO 0
-P 0
0) .poj0000H
0-4 02 CO4U 0

*.H.H.H0 Os *Hi - -4F- 0


cd;scrJ
,
* O.
.p$p2 .000 oj<)O :mm "o r. -O) O P
128 BIOMETRICS, MARCH 1972

N
M L~~~~~~~~~~~~
.K

B
~~~~~~~~~~~H
A-WEST AFRICAN
8-BRITISHt
C-AUSTRALIAN
D,E- GORILLA
F,G-ORANG-OUTANG
HI- CHIMPANZEE
J,K-PITHECANTHROPUSPEKINENSIS
L-PARANTHROPUSROBUSTUS
M-PARANTHROPUSCRASSIDENS
N-MEGANTHROPUSPALAEOJAVANICUS
O-PROCONSUL AFRICANUS

FIGURE 1
PERMANENT FIRST LOWER PREMOLAR GROUP MEANS, 90% CONFIDENCE CONTOURS AND FOSSIL
VALUES

f(t)

8-~~~~~~~~~~~~~~~1
4-~~~~~~~~~~~~

6-

-6 = =

-3.14 4

FIGURE 2
8-DIMENSIONAL DATA
PERMANENT FIRST LOWER PREMOLAR GROUP MEANS
A-WEST AFRICAN B-BRITISH C-AUSTRALIAN
D, E-GORILLA F, G-ORANG-OUTANG H, I-CHIMPANZEE
f(t) = x/V2 + x sin (t) + X COS (t) + **-
PLOTS OF HIGH-DIMENSIONAL DATA 129

(i) The meansof all human'races'lie welloutsidethe90% band fora


particular test centeredat Proconsulafricanus, providing
evidenceagainst
thehypothesis thatthisfossilis human.
(ii) The remaining fossilsdo notprovidesuchevidence.Indeed,at these
values,t. and 4, thefossilslie withinonestandarddeviationofeachhuman.
In section5 an overalltestis constructedandthecorresponding 90% con-
fidence bandis also shownon Figure3. If a band ofthiswidthis centered
at Proconsul africanusand all valuesoft are examined, thenit can be seen
that the functions corresponding to the meansof all the knownspecies
studiedfalloutsidethisband forsomevalue of t, thusproviding evidence
thatthisfossildoesnotbelongto anyknownspecies.The samecan be said
ofParanthro pusrobustu8.The remaining fossilsmayreasonablybe associated
withWestAfrican orAustralian humansbutlessreasonably withtheBritish.
The pointst2and t4yieldinteresting clustersof somefossilsand humans.
Thisaspectofthedata certainly bearsfurtherstudy.

f(t)

90% I I
Lf
a C?%?DN E CONFIDENCE BAND
BAND FOR PAR-I -FOR TEST I
TICULAR TEST 'OVERALL
8-

6-

-3 4

-3.14 3.14

FIGURE 3
8-DIMENSIONAL DATA
As IN FIGURE 2 BUT WITH FOSSILS ADDED
J,K- Pithecanthropue
pekinensia L-Paranthropus
robustua
M- Paranthropus
crassidens N-Meganthropus
palaeojavanius
0- Proconsul
africanws
130 BIOMETRICS, MARCH 1972

Ashton et al. investigated several teeth. In Table 2 the largest canonical


variates for 6 differentteeth are recorded as given in their Tables 2, 3, 4, 5,
6, 8 for men, apes, and two fossils. The group means for humans and apes
are plotted in Figure 4; again it is possible to identify a value t1 at which
the functions for humans form a tight cluster distinct from the function
values for apes. In Figure 5 the fossils have been added; for the value of
particular interest at t1the fossils have this human characteristic. Curiously
Pithecanthropus pekinensis seems to lie between humans and apes: in general
this fossil is closer to man than Paranthropus crassidens. Both fossils differ
considerably fromthe apes, and the nature of this departure is in the direction
of man.
The broad inferences drawn from these plots are similar to those of
Ashton et al. However, the peculiarities of Proconsul africanus are more
forcefullypresented here. In addition, the precise 'human' value of certain
linear combinations (t,) has been detected from the plots. These aspects
of the data warrant furtherstudy.

4. PROPERTIES OF THE PLOTS


The particular function proposed in section 2 has many useful properties
relevant for the study of data. Some of these are outlined below.

TABLE 2
LARGEST CANONICAL VARIABLE FOR 6 TEET
'l oovh Type
Source 1 2 3 4 5 6
A. British -5.35 -7.07 -9.37 -4.28 -2.15 -2.93
B. Australian aboriginal -3.93 -6.04 -8.87 -2.16 -0.50 -1.09
C. gorilla male 3.12 6.66 6.28 4.96 4.13 4.60
D. gorilla female 2.45 1.73 4.82 3.96 3.35 3.63
E. orang-outang male 2.83 5.10 5.11 2.70 1.21 l.4F9

F. orang-cutang female 1.49 1.63 3.60 1.29 -0.17 0.05


G. chimpanzee male 0.38 3.82 3.46 -1.65 -2.32 -1.92
H. ch mpanzee female 0.01 0.23 3.05 -2.25 -2.65 -2.15

I. Paranthropus crassidens -4.52 -6,.49 -7.79 3.45 4.9l 3.72

J. Pithecanthrorus -1.81 -2.94 -6.73 -0.36 -1.22 1.09


pekirlensis

Tooth Types
1 - Permanent lower second incisor
2 - Permanent lower canine
3 - Permanent lower first molar
4 - Permanent upper second premolar
5 - Permpnent lower first molar-
6 - Perlaownt 1ower third molar
PLOTS OF HIGH-DIMENSIONAL DATA 131

-4
-8

-3.14 T3.1

FIGURE 4
6-DIMENSIONAL DATA
THE FIRST CANONICAL VARIATES FOR 6 TEETH
GROUP MEANS
A-BRITSH B-AUSTRALIAN C, D-GORILLA
E, F-ORANG-OUTANG G, H-CHIMPANZEE
f,(t) = xi/AV2+ x2 sin t + X8COSt + X4 sin2t+*

(i) The functionrepresentationpreservesmeans. If x is the mean of a


set of n multivariateobservationsxi , then the functioncorrespondingto X
is the pointwisemean of the functionscorresponding to the n observations:
1 n
=
fh(t) Efi(t).
As a resultthe average will appear like an average in this plot.
(ii) The functionrepresentationpreservesdistances. One measure of
the distance between two functionsthat seems in accord with distance as
judged by the human eye is

ff (t)- fy(t)I LfL [fX(t)- fy(t)]2dt.

Moreover,this distance is proportionalto the familiarEuclidean distance


betweenthecorresponding
pointssince
k
| lf,(t) - fy(t) | IL. = 7r |IX - Y1I = 7E (Xi _pi)2
ill
132 BIOMETRICS, MARCH 1972

Because of this relation,close pointswill appear as close functions and


distantpointsas distantfunctions. Thus multivariate clustersand outliers
maybe identified visuallyfromtheplotofthefunctions. It is thisdistance-
preserving property that determinesthe coefficient1/d2 and restricts the
coefficients
oft to integers.
(iii) The representationyieldsone-dimensional projections. For a par-
ticularvalueoft = tothefunction valuef.(t0)is proportional to thelength
oftheprojection ofthevector(xl , ***, X*) on thevector
fI(to) = (1/0, sin to, cos to, sin 2to, cos 2to .** )
since
fx(to)= {X'fl(to)/[f;(tO)fl(to)])}*
[f(tO)fl(tO)].
The projectionon thisone-dimensional
spacemayrevealclusterings, outlier
patterns,or otherpeculiarities
thatoccurin thissubspaceand whichmay
be otherwise The advantageof thisplot is
obscuredby otherdimensions.
that a continuum of such one-dimensional
projectionsis plottedon one
graph.

C~~~~~-

-3.14 3.1
t2
FIGURE 5
6-DIMENSIONAL DATA
As IN FIGURE 4 WITH FOSSILS ADDED
I-Paranthropus
crassiden8
J-Pithecanthropus
pekinenig
PLOTS OF HIGH-DIMENSIONALDATA 133

(iv) The representation


preservesvariances.If the components
of the
data areuncorrelated
withcommon variancea2 thenthefunctionvalueat t,
f(t), has variance
var [f(t)] = a2(1 + sin2 t + cos2 t + sin22t + cos22t + *
If k is odd thisreducesto a constant,2 _k; if k is even the variancelies
between a2(ck - 1) and I2a(k + 1). In thefirstcase thevariancedoesnot
dependon t and in the secondthe relativedependence on t is slightand
decreasesas k increases.Thus the variabilityof the plottedfunctionis
almostconstantacrossthe graph.This standarddeviationof f is denoted
by af whereit appearson the plots.This facilitates the interpretationof
theplotas outlinedin thenextsection.

5. INTERPRETATIONOF THE PLOTS


(i) Clustering.If some plottedfunctions forma band by remaining
close togetherforall values of t, thenthe corresponding pointsare close
together in theEuclideanmetric(see 4(ii)). Sucha bandrepresents a cluster
ofdata points.If a groupoffunctions comeclosetogether foronlyone or a
fewvalues of t, thenthe corresponding pointsare close in the directions
definedby the corresponding vectorsfl(t). Thus it is possibleto identify
clusters ofpointsevenwhensomeadditionalvariablesare present.
(ii) Testsof significance
at particular valuesof t. In the exampleit is
possibleto identify a priorifromFigure2 certainvaluesoft forwhichit is
of interestto testthe hypothesis that the expectation of f.(t) = f,(t)for
somehypothesized L. Sincethe varianceof f.(t) is known(4(iv)) a testof
thishypothesis maybe constructed by evaluatingthe significance level of
.
z = [f.(t)- f.(t)]/{var[fU(t)]}
If the components xi are assumedto be independent normalvariables,
thenz has a standardnormaldistribution underthe hypothesis p = La .
This distribution maybe used to assessthe hypothesis p = p0 or to con-
structa confidence intervalforp as the set of valuesnot rejectedby this
testat a levela. The widthofthisconfidence setis shownon Figure3. Note,
however, thatthistestis exactifthevalueoft is chosena priori.
(iii) Overalltests. It is also possibleto constructtestssuggestedby the
data. If the xi are independent normalvariateswithvariancea2 and ex-
pectation,u;, then lIx-- 1 2/o has a x2distributionwithk D.F. The squared
ofx - p on anyone-dimensional
lengthoftheprojection spaceis notlarger
than lix- Ml
12Furthermore,thesquaredlengthoftheprojectionofx - p
on thevector
v. =f(t)1[f1 (t)f(t)]1
is just

i(X - )'vJ1 = Ifs(t)-f,(t)i2/[fl(t)f1(t)].


134 BIOMETRICS, MARCH 1972

1 - a, forall valuesoft,
Hencewithprobability

|f'(t) - fV(t)12 < o2Jfi(t)IX2(Ce)< (I +- 1)72X2(a)

wherex2(a) denotestheuppera pointofthex2distribution.


Thus withprobability 1 - a thefunctionfa(t)lies in a band withfixed
widthaboutfa(t).If p is known,outlierswillfalloutsidethisbandand may
be easilyidentifiedvisually.Notethatthelevelofthetestis roughly n times
therejection probability ifn pointsareexamined.A bandofthe samewidth
centered at f.(t)is a 1 - a confidenceregionforf, (t). If fa(t)fallsoutside
thisbandthereis evidenceagainstthehypothesis 9t = a, .
(iv) Linearrelationships.If a pointy lies on a line joiningx and z,
thenforall valuesoft,f,(t)is betweenf.(t)andf,(t).Thisfactmaybe readily
observedas in Figure5 wherethefossilParanthropus crassidens liesforthe
mostpartbetweentheapes and man.

6. FURTHER REMARKS
In thissectionare summarized a numberof observations based on ex-
perience withthistypeofplot.
(i) Plotting manypoints. Onlya limitedamountof information may
be absorbedfromone plot.If each pointis to be examinedin detail,only
about10 pointsmaybe plottedin thiswayon thesamegraph.A procedure
thatprovedusefulin cases involving morepointswas firstto plot all the
pointson one graphfromwhichgeneralcharacteristics ofthesamplecould
be noted.Separateplotsof 10 pointswerethenused to assess individual
pointsin relationto thewhole.
(ii) Otherfunctions.The properties of sections2 and 3 willalso hold
forfunctions oftheform
f1(t) = x1 sinnit + x2 cos nit + X3 sinn2t + x4 cos n2t +
wherethe ni are different integers.Wheremorethan one plot is desired,
function plotsbased on sucha serieswillprovidemoreinformation about
thedata. It was foundthatplotswithsmalln, wereeasierto lookat than
thosewithlargeni . The statistical properties ofthefunctionplotsdepend
heavilyon the orthogonality of the trigonometric and on the
functions
approximate constancyof the lengthof the vectorfl(t). Otherfamiliesof
orthogonal functionssuchas orthogonal polynomials maybe tried.Typically
thesedo nothave all theproperties oftheproposedfunction. Trigonometric
functions involvinga puresine or cosineserieswillalso lack manyof the
properties ofthemixedseriesproposed.
(iii) The one-dimensional projections. As notedpreviously the function
valuesf1(t)are theprojections ofthevectorx on thevector
fl(t) = (1/V, sin t, cos 4,sin 2t **.
Whennormalized thisvectordefinesa pointon the k-sphere.
As t varies
thispointdescribes
a periodiccurveon thek-sphere.
Someidea ofhowclosethiscurvecomesto pointson thespheremaybe
PLOTS OF HIGH-DIMENSIONAL DATA 135

obtainedby calculatingthe distanceofpointsto the closestpointon the curve


or on its reflection
throughthe origin.This distancemay be measuredin terms
of the Euclidean metricor alternativelyin termsof the size of the angle,
measuredat the origin,betweenpointson the sphere.Table 3 gives average
values ofthesetwo measuresfortwo curves:the curvedefinedby the function
of section2 and by the morecomplicatedfunction
fi(t) = sin 2t,cos 2t,sin 4t, cos 4t, sin 8t .

The averages were calculated by Monte Carlo assuming a uniformdistri-


butionon the sphere.
From Table 3 we note that (a) as the dimensionincreasespoints tend
to be furtherfromthe curve on the average; and (b) on the average points
lie closerto the morecomplicatedcurve.
The functionsproposed in (ii) above describe complex curves passing
closerto more pointson the sphere.The simplicityof the curve (and hence
smoothnessof the plot) and the proximityof its path to everypoint on the
spheredefinetwo mutuallyconflicting goals. If the curveis too complicated
the plot containstoo much detail to be readilyabsorbed.
(iv) Using principal components. The tests and confidencesets were
developed in section 5 for data with independentnormalcomponentswith
the same variances. For these tests to be applicable,multivariatedata must
firstbe transformed to approximatethose conditions.A principalcomponent
analysiswas used by Ashtonet al. to yieldsuch variables and in generalthis
procedureis recommended.In the plots low frequenciesare more readily
seen than highfrequencies.For this reason it is usefulto associate the most
importantvariables with low frequencies.This may be done by associating
x1 withthe firstprincipalcomponent,x2 withthe second,and so on.
(v) Otherapplications. This plottingprocedurewas used to solve a
somewhatartificialproblemin patternrecognitionposed by Mrs. J. J. Chang.
An unknown2-dimensionalpattern of 25 points was prepared. Gaussian
noise with comparablesecond momentswas added in another3 dimensions
and the resulting5-dimensionalstructurewas givenan arbitraryorientation.

TABLE 3
AVERAGE DISTANCES TO CLOSEST POINT ON CURVE OR ON ITS REFLECTION THROUGH THE
ORIGIN
Simple Curve fl(t) = 1/ \2, sin t, cos t, sin 2t, cos 2t,
ComplicatedCurve fl(t) = sin2t,cos 2t,sin4t,cos 4t,sin 8t, ...

Simple Complicated
Euclidean Anglein Euclidean Anglein
Dimension distance degrees distance degrees
3 .33 19 .25 14
5 .57 33 .49 28
7 .68 40 .61 35
9 .76 45 .66 39
136 BIOMETRICS, MARCH 1972

The problemwas to identify the'pattern.Thesedata wereplottedusingthe


function proposedin section2 and otherfunctions as in (ii) above.
It was assumedthatpatternscouldbe distinguished fromnoiseby the
presenceof strongclustering or regularity
and the absenceof outliersin
projections on 1-dimensional spacesthat lie in the 2-dimensional space of
the pattern.Such regularity was foundfor2 values of t. One vectorfl(t)
fromeach of2 different plotswas usedto definea planeand theprojection
ofthedata on thisplanerevealedthepattern.
Otherexamplesinvolvedmanymeasurements on several'subjects.The
measurements weretransformed to principalcomponents and thenplotted
as in section2. Thisprocedure gavesomeidea oftherelationand similarities
amongthe measurements. All of theseobservations based on the plotsof
section3 maybe determined directly by examining thedata. The advantage
in plottingis thattheseobservations comemoreeasilyand morequickly.
The plottingprocedure is not a substitute fordata manipulation. The
choiceof whichnumbers to examine,herecanonicalvariables,is based on
thenatureofthedata and theobjectivesoftheanalysis.The plotting pro-
cedureis onlya methodofdisplaying whatever numbers aredeemedrelevant.'
The examination ofsuchplotspermits thedetection ofmanyabnormalities
in thedata.Someofthesewillbe spurious, othersmayrepresent realeffects.
The existenceof the overalltestspermitsthe identification of the very
significant Typicallythiscannotbe donewithotherplotting
effects. methods.
The advantageof the plotis thatpeculiarities in the data are broughtto
theattention oftheinvestigator.
ACKNOWLEDGMENTS
The authorgratefully
acknowledges
manyhelpfulsuggestions by the
withColinMallows.These aided in the revision
and conversations
referees
ofthepaperand in theunderstanding
ofthenatureofthegeometric prob-
lemsinvolved.
GRAPHIQUES DE DONNEES AVEC DE GRANDES DIMENSIONS
RESUME
On proposeunem6thodepourtracerdes graphiquesde donn6esA plusde deuxdimen-
sions.Chaque pointdonn~e,x = (XI, ***, Xk)est transformed en une fonctionde la forme
f1(t) = xi/A/2+ x2sint + XS cos t + X4 sin2t + x5 COS2t + ,
dansF'intervalle
et on tracele graphiquede la fonction -7r < t <ir.
On explorequelquespropri6tes statistiquesde la methode,son applicationest i1lustr~e
par un exempled'anthropologie.

REFERENCE
Ashton,E. H., Healy,M. J. R., and Lipton,S. [1957].The descriptive
use ofdiscriminant
functionsin physicalanthropology. Proc.Roy.Soc. B 146,552-72.

Received
October
1970, Revised
July1971
data; Clusteranalysis;Patternrecognition;
Key Words:Plottingmultivariate Discrimina-
tion;Classification.

Das könnte Ihnen auch gefallen