Beruflich Dokumente
Kultur Dokumente
Author(s): D. F. Andrews
Source: Biometrics, Vol. 28, No. 1, Special Multivariate Issue (Mar., 1972), pp. 125-136
Published by: International Biometric Society
Stable URL: http://www.jstor.org/stable/2528964
Accessed: 04/09/2010 00:05
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at
http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless
you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you
may use content in the JSTOR archive only for your personal, non-commercial use.
Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at
http://www.jstor.org/action/showPublisher?publisherCode=ibs.
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed
page of such transmission.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.
International Biometric Society is collaborating with JSTOR to digitize, preserve and extend access to
Biometrics.
http://www.jstor.org
28, 125-36
BIOMETRICS
March1972
D. F. ANDREWS1
Bell Telephone
Laboratories,
MurrayHill, NewJersey,
and PrincetonUniversity
SUMMARY
A methodofplottingdata ofmorethantwodimensions is proposed.Each data point,
x = (xi, *, xk),is mapped into a functionof the form
1. INTRODUCTION
Plottinghas longbeen one of the mostusefultoolsin the analysisof
data. Muchof modelbuildingis aided by the plottingof residuals.Distri-
butionalassumptionsarefrequently basedon probabilityplots.
Unfortunately, most graphicalmethodsare restrictedto displaying
univariateor bivariatedata. Some progresshas been made in the use of
differentsymbolson a two-dimensional plotto givesomeidea of a thirdor
fourthdimension, but thesemethodslack precisionand seemlimitedto a
smallnumber ofdimensions.
However,one is accustomedto examiningplots of functions,and
theymaybe infinite-dimensional.Thissuggests imbedding highdimensional
data in a higherdimensional but easilyvisualizedspace of functions,
and
thenplottingthe functions.One way of doingthisis describedhere,and
someexamplesaregiven.
125
126 BIOMETRICS, MARCH 1972
required for this plot is small, but an output device with relatively
highprecisionsuch as a CALCOMP or microfilm
plotteris required.
3. A BIOLOGICAL EXAMPLE
Ashtonet al. [1957]used graphicaltechniqueswhen comparingmeasure-
mentson the teethof fossilswiththose of different 'races' of men and apes.
From the data, groupmeans werecalculatedformen and apes and canonical
variables determinedto maximizethe between sum of squares relative to
the withinsum of squares. The firsttwo such variateshad the largesteigen-
values and these variables were used in the plottingprocedure.The group
means were plotted and 90% confidencecontourswere drawn about them.
The fossilvalues were then plotted and assessed relative to the different
groups.
Ashton et al. analyzed several teeth. Here we presentthe data for one
tooth-the permanentfirstlowerpremolar(Table 1). In Figure 1 the group
means are plotted togetherwith approximate 90% confidencecontours.
The fossil measurementsare also plotted. From this plot those authors
concludedthat Proconsulafricanusis verylike a chimpanzeewhilethe other
fossilsare morelike humans.
Some group means have some moderatelylarge values of the thirdand
fourthcanonical variates,while large values occur forall 8 variates forthe
fossils.Plottingall the variatespermitsthe examinationof the effectof these
largevalues.
In Figure 2 the group means of all the canonical variables have been
plottedas functions.From this plot it can be seen that the humans and the
apes formfairlydistinctgroups. Among the apes, the chimpanzeesstand
out fromthe gorillasand orang-outangs. Certainvalues oft seem ofparticular
interest:at both t2and t4the functionvalues forhumanshave a fairlyprecise
value which is much different fromthe correspondingvalues for any ape.
The point t1has an analogous propertyforchimpanzees,gorillas,and orang-
outangs.The point t3 correspondsto the widestseparationamong the group
means.
In Figure3 the fossilshave been added. ImmediatelyProconsulafricanus
stands out as quite different fromany group.For some values of t it is very
like a chimpanzee,gorilla, orang-outang,or man and has certain char-
acteristicsin commonwith all of these! The remainingfossilsshare with
man the characteristichuman quality exhibitedby t.2and t4. However,for
some othervalues of t they are quite different frommen and are more like
apes, particularlychimpanzees.This is not apparent fromFigure 1. Fortu-
nately it is possible to calibrate these observationswith significancetests
and confidence- sets.
In sections4 and 5, the variance of a functionvalue var [f(t)]is derived
and is shown to be almost independentof t. The standard deviation a! is
shown on the graph, as is the width of a 90% confidenceband for f,(t0).
Applyingthese at the values t2or toit can be seen that
PLOTS OF HIGH-DIMENSIONAL DATA 127
O nc\ LnO LI 04
all tloMLfnNL0rncr
nf O n Ln t O LQOOQO
O O r- O O O LnNf CO H --I
HC H ++ + + + I 1+I +
++ I+ II++
H H I+II++II+ ++III +
+11 l ++I 1+
n
H4~~~-t'-C~~~M L'CC)Ct0000COCM HLC\H00 t-
z *R + + I + + + + +n'_:I- + I + ++ I + + 0 + + + + +I
oH+ + t H +++++f +
O He LQ~ CC) (1 3C0 40C3 LCMHC<) - LAtCC)C HD
Hf .tH n lIt+++
0 N~~~~~~~~~~~~~
EA : 0 H
Q H U
OS ) Os :s020
zH ( D zt
0 CY')0
CY) t
U O~
H~~~~~. H wOHM
-q OH C0 2 0 020ON ?Q
~ U]***O
~ Hp*+OO~ ~ ~ ~
Co t-- LI CO
Pctlcc *
UQ- *0) b00d *0MD2
+S o D S0
H C* U (t ) V* H
C IX O* O U
n X ?0
@@@C UHjH I) C ? S0 - pt rc
AU 2QU 0
U OC
ItIHI )O I ls < < t
P0S4-' N 4 V
pSUU U a*-.X3
,cU
Co r-
Ur.H) t$42 S .u
r-i b.0 Ei *H Q
Os rO 0
-P 0
0) .poj0000H
0-4 02 CO4U 0
N
M L~~~~~~~~~~~~
.K
B
~~~~~~~~~~~H
A-WEST AFRICAN
8-BRITISHt
C-AUSTRALIAN
D,E- GORILLA
F,G-ORANG-OUTANG
HI- CHIMPANZEE
J,K-PITHECANTHROPUSPEKINENSIS
L-PARANTHROPUSROBUSTUS
M-PARANTHROPUSCRASSIDENS
N-MEGANTHROPUSPALAEOJAVANICUS
O-PROCONSUL AFRICANUS
FIGURE 1
PERMANENT FIRST LOWER PREMOLAR GROUP MEANS, 90% CONFIDENCE CONTOURS AND FOSSIL
VALUES
f(t)
8-~~~~~~~~~~~~~~~1
4-~~~~~~~~~~~~
6-
-6 = =
-3.14 4
FIGURE 2
8-DIMENSIONAL DATA
PERMANENT FIRST LOWER PREMOLAR GROUP MEANS
A-WEST AFRICAN B-BRITISH C-AUSTRALIAN
D, E-GORILLA F, G-ORANG-OUTANG H, I-CHIMPANZEE
f(t) = x/V2 + x sin (t) + X COS (t) + **-
PLOTS OF HIGH-DIMENSIONAL DATA 129
f(t)
90% I I
Lf
a C?%?DN E CONFIDENCE BAND
BAND FOR PAR-I -FOR TEST I
TICULAR TEST 'OVERALL
8-
6-
-3 4
-3.14 3.14
FIGURE 3
8-DIMENSIONAL DATA
As IN FIGURE 2 BUT WITH FOSSILS ADDED
J,K- Pithecanthropue
pekinensia L-Paranthropus
robustua
M- Paranthropus
crassidens N-Meganthropus
palaeojavanius
0- Proconsul
africanws
130 BIOMETRICS, MARCH 1972
TABLE 2
LARGEST CANONICAL VARIABLE FOR 6 TEET
'l oovh Type
Source 1 2 3 4 5 6
A. British -5.35 -7.07 -9.37 -4.28 -2.15 -2.93
B. Australian aboriginal -3.93 -6.04 -8.87 -2.16 -0.50 -1.09
C. gorilla male 3.12 6.66 6.28 4.96 4.13 4.60
D. gorilla female 2.45 1.73 4.82 3.96 3.35 3.63
E. orang-outang male 2.83 5.10 5.11 2.70 1.21 l.4F9
Tooth Types
1 - Permanent lower second incisor
2 - Permanent lower canine
3 - Permanent lower first molar
4 - Permanent upper second premolar
5 - Permpnent lower first molar-
6 - Perlaownt 1ower third molar
PLOTS OF HIGH-DIMENSIONAL DATA 131
-4
-8
-3.14 T3.1
FIGURE 4
6-DIMENSIONAL DATA
THE FIRST CANONICAL VARIATES FOR 6 TEETH
GROUP MEANS
A-BRITSH B-AUSTRALIAN C, D-GORILLA
E, F-ORANG-OUTANG G, H-CHIMPANZEE
f,(t) = xi/AV2+ x2 sin t + X8COSt + X4 sin2t+*
C~~~~~-
-3.14 3.1
t2
FIGURE 5
6-DIMENSIONAL DATA
As IN FIGURE 4 WITH FOSSILS ADDED
I-Paranthropus
crassiden8
J-Pithecanthropus
pekinenig
PLOTS OF HIGH-DIMENSIONALDATA 133
1 - a, forall valuesoft,
Hencewithprobability
6. FURTHER REMARKS
In thissectionare summarized a numberof observations based on ex-
perience withthistypeofplot.
(i) Plotting manypoints. Onlya limitedamountof information may
be absorbedfromone plot.If each pointis to be examinedin detail,only
about10 pointsmaybe plottedin thiswayon thesamegraph.A procedure
thatprovedusefulin cases involving morepointswas firstto plot all the
pointson one graphfromwhichgeneralcharacteristics ofthesamplecould
be noted.Separateplotsof 10 pointswerethenused to assess individual
pointsin relationto thewhole.
(ii) Otherfunctions.The properties of sections2 and 3 willalso hold
forfunctions oftheform
f1(t) = x1 sinnit + x2 cos nit + X3 sinn2t + x4 cos n2t +
wherethe ni are different integers.Wheremorethan one plot is desired,
function plotsbased on sucha serieswillprovidemoreinformation about
thedata. It was foundthatplotswithsmalln, wereeasierto lookat than
thosewithlargeni . The statistical properties ofthefunctionplotsdepend
heavilyon the orthogonality of the trigonometric and on the
functions
approximate constancyof the lengthof the vectorfl(t). Otherfamiliesof
orthogonal functionssuchas orthogonal polynomials maybe tried.Typically
thesedo nothave all theproperties oftheproposedfunction. Trigonometric
functions involvinga puresine or cosineserieswillalso lack manyof the
properties ofthemixedseriesproposed.
(iii) The one-dimensional projections. As notedpreviously the function
valuesf1(t)are theprojections ofthevectorx on thevector
fl(t) = (1/V, sin t, cos 4,sin 2t **.
Whennormalized thisvectordefinesa pointon the k-sphere.
As t varies
thispointdescribes
a periodiccurveon thek-sphere.
Someidea ofhowclosethiscurvecomesto pointson thespheremaybe
PLOTS OF HIGH-DIMENSIONAL DATA 135
TABLE 3
AVERAGE DISTANCES TO CLOSEST POINT ON CURVE OR ON ITS REFLECTION THROUGH THE
ORIGIN
Simple Curve fl(t) = 1/ \2, sin t, cos t, sin 2t, cos 2t,
ComplicatedCurve fl(t) = sin2t,cos 2t,sin4t,cos 4t,sin 8t, ...
Simple Complicated
Euclidean Anglein Euclidean Anglein
Dimension distance degrees distance degrees
3 .33 19 .25 14
5 .57 33 .49 28
7 .68 40 .61 35
9 .76 45 .66 39
136 BIOMETRICS, MARCH 1972
REFERENCE
Ashton,E. H., Healy,M. J. R., and Lipton,S. [1957].The descriptive
use ofdiscriminant
functionsin physicalanthropology. Proc.Roy.Soc. B 146,552-72.
Received
October
1970, Revised
July1971
data; Clusteranalysis;Patternrecognition;
Key Words:Plottingmultivariate Discrimina-
tion;Classification.