Sie sind auf Seite 1von 18

Ordering Directional Data: Concepts of Data Depth on Circles and Spheres Author(s): Regina Y.

Liu and Kesar Singh Reviewed work(s): Source: The Annals of Statistics, Vol. 20, No. 3 (Sep., 1992), pp. 1468-1484 Published by: Institute of Mathematical Statistics Stable URL: http://www.jstor.org/stable/2242022 . Accessed: 11/10/2012 18:54
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp

.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.

Institute of Mathematical Statistics is collaborating with JSTOR to digitize, preserve and extend access to The Annals of Statistics.

http://www.jstor.org

The Annals of Statistics 1992, Vol. 20, No. 3, 1468-1484

ORDERING DIRECTIONAL DATA: CONCEPTS OF DATA DEPTH ON CIRCLES AND SPHERES'


BY REGINA Y. Liu ANDKESAR SINGH Rutgers University
Threenotions of depthfordirectional data, angularsimplicial depth (ASD), angular Tukey'sdepth(ATD) and arc distancedepth(ADD), are developed and studied. The empirical versions ofthesedepths giveriseto center-outward rankings ofangulardata which maybe regarded as extensions of the usual center-outward ranking on the line. Three medians derivedfromthese depthsare examined and compared. Applications in nonparametric classification and in implementing to conthe bootstrap struct confidence regions fordirectional parameters are briefly discussed.

1. Introduction. The purposeofthisarticle is to develop threeconcepts of data depthfordirectional data, namely, angularsimplicial depth(ASD), angularTukey'sdepth(ATD) and arc distance depth(ADD). ASD extends the notionof simplicial depth(SD) in Liu (1988, 1990) fromRP to circlesand spheres.ATD is an analog of Tukey'sdepth(TD) [Tukey(1975)] on RP for populations and data on circlesand spheres. A notionequivalent to ATD has been introduced bySmall(1987). Li distance in theEuclideanspace givesrise to the notionofADD forspheres and circles. The conceptof depthon spheresleads to a propernotionof center(or median) and a rankingof directional data in the order of centrality. In particular, suchranking leads to detection of"extreme" data values,a natural definition ofinterquartile range(on the circle)and analogsoflinearcombinations of orderstatistics of directional data in general.The rankings derived from ASD and ATD can be justified as naturalextensions ofthe usual linear ranking by the following argument. Whenthe entiredistribution is concentratedon a semicircle, the distribution couldbe regarded as beingon the line segment [7r/2, 7/2]. In such a case one wouldnaturally expectthe angular depths(being zero throughout the other semicircle) to coincidewith their parentnotionsof depthon the line. BothASD and ATD possess this consistency property. As a result, the center-outward rankings based on the decreasingvalues ofthesedepths completely agreewiththatbased on theusual order statistics on the line.As an illustrative example, supposethe angulardata (in degrees)are 62, 73, 85, 96, 97; then the ranking in the orderof centrality

Received March1989;revised June1991. 'Researchsupported byNSF Grants and DMS-90-04658. DMS-88-02558 AMS 1980 subject Primary classifications. 62H05,62H12; secondary 60D05. Keywordsand phrases.Angular simplicial depth, angularTukey'sdepth, arc distance depth, median, directional data,center-outward ordering, vonMisesclass ofdistributions. 1468

ORDERING DIRECTIONAL DATA

1469

ASD or ATD is 85, (73, 96), (62, 97), wherea provided by the linearranking, a tie. pair ( , ) indicates givenby ASD or ATD has severalinteresting ranking The center-outward we mention two: the following Specifically, applications. the Consider are given. twodifferent (Y1,... , Yn)from populations spherical a new data point Z to one of the two populations. problem of classifying to ranksof Z withrespect thecenter-outward respectively, First,compute, rule is and The proposed be denoted by and Let these ranks ry. rx Xi's Yi's. and to the Y population if rx/m< ry/n, Z to the X population to classify rule is studiedin Grossand Liu (1989). otherwise. This classification on Sd, a ofinterest 2. On implementing the bootstrap:Let 0 be theparameter A centerunit sphere, and let 6n be its associatedestimate. d-dimensional methodto the percentile outwardrankingis essentialfor implementing a bootstrap (1979) forthepercentile form region for0 [see Efron confidence First,obtain a certainnumberof The procedure is as follows: method]. rank(according replicasof 6n; second,assignthe center-outward bootstrap to ASD or ATD) to each replica;finally, delete lOOa% of the "outmost" replithe remaining replicas.The smallestconvexpatchon Sd containing of this cas is then a (1 - a) bootstrap confidence regionof 0. Properties willbe reported elsewhere. region bootstrap confidence an a circleor a spherepresents ofASD and ATD throughout The constancy on R Of course,theirparentdepthsare neverconstant interesting situation. and the main resultsare distributional We have fully studiedthe constancy, in termsofconstant depth(cf.Sections3 and 4). characterizations on Sd. Somedetailed All threenotions ofangulardepthgiveriseto medians ofa median ofthosemediansare presented later.The definition comparisons the same as themedian on a circlegivenin Mardia[(1972),page 28] is in spirit can lead to from in some unusual cases the definition ADD, although derived literally. is followed ofADD ifthe definition onlya local maximum on the unit distributions to continuous For simplicity we restrict ourselves circleand absolutely on theunitsphere;in each case distributions continuous Throughas the center. we take the origin (denoted by 0 and 0, respectively) out thisarticle,- 0 is used to indicate opposite pointof 0. the diametrically and notation. Section2 contains basic definitions In Section3 we presentthe following of ASD w.r.t.a spherical properties distribution:
1. Computational simplicity of ASD: 1. On classificationproblems: Suppose two trainingsamples (X1, . . ., Xm) and

2. A differential formula for ASD and its applications: The derivative of

sphericaltriangleis equivalentto solvinga 3 x 3 linear systemof equations.

Checking that a point belongs to a

formula [see (3.1)] whichyields ASD(-) on a circlehas a simpleexplicitly includea monotonicofASD. These properties properties manyinteresting

1470

R. Y. LIU AND K. SINGH

ofan antipodally ityproperty oftheASD and a characterization symmetric ASD value, 1/4. distribution on the circleas havinga constant Equation (3.4) 3. An equationconnecting ASD and SD and its applications: [(3.6)] connects ASD on a circle(a sphere)withthe SD on a line (a plane). symmetric distribuThis equationcan be used to characterize antipodally the tions on the sphereby the constant value of ASD = 1/8 throughout sphere. to justify We showin Section4 thatATD(*) has the appropriate properties itselfas a notionof data depth.The maximum value forthis depthdoes not ofthe forinstance, exceed1/2 and it is attained, at the modeofany member von Mises class ofdistributions. Section5 discussesthe robustness aspectofMardia'smedianand medians functions and a "breakgivenby ASD, ATD and ADD in termsof influence down"concept. Some illustrative are also given. examples Some concluding remarks are made in Section6. 2. Definitions and notation. depthwhichwe propose Angularsimplicialdepth. The angularsimplicial in thisarticle depthfor is a naturalanalogfordirectional data ofthesimplicial data on euclidean spaces, introduced in Liu (1988, 1990), which we now describe briefly. In Rd, a simplexO(x1,... ,xdI1) with (d + 1) verticesx1,... , Xd+1 is defined to be the closedconvex hull withextremities at thesepoints.Let F( ) be a distribution a in depthof x w.r.t. F, and x point Rd. The simplicial is then definedto be the probability that x be in a simplex SD(x), fromF. + Xd+ ), where X1,..., X1 are (d 1) i.i.d. observations O(X1,..., In lR',O(X1, X2) iS simply theclosedlinesegment joiningX1 and X2,sayX1X2, X1, X2 and X3, withvertices ous]. In R2, O(X1, X2, X3) is the closedtriangle median say A(X1,X2 X3), and SD(x) PF(x E A(X1,X2,X3)). The simplicial is thenthepointwhichmaximizes SD(-) (or theaverageofsuchpointsifthere are many).Note that in R' the simplicial mediandividesthe line into two ofequal probabilities half-lines and it agreeswiththe "usual" median.In Liu and (1990) it is arguedthat SD(-) can be viewedas a measureof data depth, that the simplicial of a notionof medianpossesses manydesirablefeatures median. The edges of a simplexin Rd are the line segments connecting pairs of points(vertices). Whenwe moveto the sphere, it is naturalto replacesuch a line segment by the "shortestcurve"joininga pair of pointson the sphere. Let Pi and P2 be two pointson a sphere.It is knownthat such a shortest curveis the shortarcjoiningp1 and P2 on thecircle whichpasses through p1 to as a and P2 and has the same center is referred as thesphere. (Such a circle ofany to spheres greatcircle.) Evidently thisshortest curvecan be generalized dimension and is ambiguous case wherep1 and P2 are onlyin the nongeneric
and SD(x) PF(x E X1X2) [= 2F(X)(1 - F(x)), assuming that F is continu-

ORDERING DIRECTIONAL DATA

1471

curveallowsus to diametrically oppositeeach other.The idea ofthe shortest generalizethe notionof simplexto the sphericalcase. We discuss onlythe cases of the circleand the two-dimensional sphere,althoughit will be clear extendsinductively to any dimension. For any two points that the definition and on a circle the is the short arcjoiningp1 and P1 P2 corresponding simplex and for three and [denoted pointsq1, q2 P2 byarc(p1,P2)], q3 on a sphereit is thespherical short triangle [denoted byAj(q1, q2, q3)] bounded bythethree the angularsimpliarcs arc(ql, q2), arc(ql, q3) and arc(q2,q3). We nowdefine cial depthto be (2.1) ASD(p)
PH(P E arc(W1, W2))

if p is a pointand H a distribution on a circle,and W1 and W2 are i.i.d. observations fromH; (2.2) ASD(p) = PH (p E AS(WI,W2,W3)) ifp is a pointand H a distribution on a sphereand W1,W2 and W3 are i.i.d. observations fromH. Note thatif H is continuous on a circleand absolutely on a sphere, thenthe ambiguous continuous occurwithprobability simplicies zero. A maximumpoint of ASD( ) is defined to be an angular simplicial
median (ASM). Evidentlythis median is rotationinvariant.

ofASD( ) as We define the empirical version

(2.3)

ASDn,(p)=()

'(p E

W)) arc(Wil,

circular distribution and E * runs overall possiblepairs of (Wi1l,Wi),and as


(2.4)
= (n) n(P) ASDn(P)
1

for a point p on the circle, where W1,. .. , Wn is a random sample from a

**

E '(p

S(W1~Wi2'WJ))

for a point p on the sphere, where W1,. .. , Wn is a random sample from a

distribution and E** runsoverall possibletriplets spherical (Wi

, Wi2,Wi3).

Angular Tukey's depth. Following Small (1987), we define the angular Tukey's depth fora given spherical distributionH as follows:

(2.5)

ATDH(O)

{S: Oec-S)

inf {PH(S)},

wherethe infimum is takenoverthe set ofall closedhemispheres S containor in theirinteriors. ing 0 in theirboundaries We call a maximum pointof ATD( ) an angular Tukey'smedian (ATM). Note thatATM is also rotation invariant. See Example4.4.4 in Small (1987) formoreinvariance ofATM. properties In defining the empirical versionofATD( ), ATDn( ), we replacePH( ) in (2.5) by its corresponding empirical probability.

1472

R. Y. LIU AND K. SINGH

Arc distance depth. We define ADD ofa point0 on the sphereSd as ADD(O) = 7r- fl(0, p) dH( p),

where1(6,(p)is the Riemannian distance between0 and (p;thatis, the length of the shortarc joining0 and *c on the greatcircledetermined by 0 and (p. Again,the empirical ofADD is defined version by replacing H( ) by Hn( ). A maximum to as an arc distancemedian (ADM). pointof ADD(-) is referred This idea of minimizing L1 distancewas used by Gower(1974) to define a medianin Rd. Its extension to circleswas givenin Mardia(1972) generalized and to spheresin Fisher(1985). 3. Properties of angular simplicial depth. or not a pointon a circle(a sphere)lies on the shortarcjoiningtwo whether data points(the spherical threedata points)can be reduced to solving triangle a simplesystemof linear equations.This shows that computing ASD(-) is Let H( ) be the population on the quite straightforward. distribution defined at the origin0. Givena point0 on the circleand anytwo unitcirclecentered data pointsW1and W2 fromH(0), 0 lies on the shortarc arc(W,,W2)ifand intersect.In other words, E6 only if the line segments00 and W1W2 if if a and and there exist W2) only /3such that 0 < a, < 1 and arc(Wi,
Computational simplicity of ASD. We first point out that determining

dinatesof the point *. For the sphericalcase this observation becomesthe 0 is on the sphericaltriangleA,(W1,W2,W3) if and only if 00 following: intersects the EuclideantriangleA(W1,W2,W3). This is equivalentto aOc + yW2c+ (1 - /B- 7)W3 for some a, /3and y such that 0 < a, ,B, y< 1 ,f3Wc and 0 /3+ y < 1. The same observation holds for any general Sd. This ofASD shouldgreatly computational simplicity enhanceits applicability. on theunitcircle distribution and let h(-) be its density ifit exists.Here 0 can 0 and 2-7. be simply as an anglebetween expressed
PROPOSITION 3.1.

a6c = /Wc + (1 - 13)W2c. Here the notation *c stands forthe Euclidean coor-

A differential formula for ASD and its applications. Let H(Q) be the

Suppose that h(*) exists and is continuous at 0. Then d - ASD(6) = 2(A6- CO)h(0)-

(3.1)

where AO and Co stand for the probabilities of the semicirclesjoining 0 and - 0 in the counterclockwise and clockwise directions,respectively.

For a 0 and a positive increment 80,thedifference [ASD(0 + 30) ASD(6)] involvesonlythose pairs of observations {W1, W2}fromH(-) which
PROOF.

ORDERING DIRECTIONAL DATA

1473

have the following property: {0 E arc(Wl,W2) and (0 + 80) e arc(W1, W2)} or


(0 C arc(W1,W2) and(O + 80) E arc(Wl,W2)}.

These two situationswill occur if and only if either W1 or W2 lies on arc(0,0 + 80). Usingthisfactand the equality
P(E1) - P(E2) = P(E1 - E2) - P(E2
-El)

forany twoeventsE1 and E2, we obtain


(3.2) ASD(0 + 80)
-

ASD(0)

= 2(AO

Co)f| '6 h(a) da + o(80).

The proposition follows.El We define a point 0 to be regular w.r.t.a distribution H(-) if H(-) has a continuousdensityin a neighborhood of 0. We also definea pointto be a 0 and -0 divides median axis on the circleif the diameter passingthrough withequal probabilities. The following the circleintotwo semicircles propositionassertsthatASM is alwaysa medianaxis.
PROPOSITION 3.2. If 00 is a median axis with h(0o) > h(-00) and the points 00 and - 00 are regular, then 00 is a local maximumof ASD. Conversely,if 00 is a local maximum of ASD and 00 and - 0 are regular withh(00) > 0, then00 is a medianaxis and h(00) ? h(- 00).

REMARK3.1. (i) On the circleADD(-) also allows a simple differential equation,namely,

d - ADD(0) do

= (AO - Co)

provided that h(Q) existsat 0 and -0. The proof is givenin Mardia [(1972), page 31]. (ii) The equation in (i) immediately impliesthat statements similarto Proposition 3.2 hold forADD.
COROLLARY 3.1 [Monotonicity ofASD( )]. Supposeh( ) is symmetric about itsmaximum sides of 00 untilits on both point00 and decreases monotonically nonincreasdiametrically opposite point - 00. Then ASDQ ) is also monotonic directions ing in both from00 to - 00. In particular, 00 is a maximumpoint of ASD(-).

The next property of ASD(-) on a circle will allow us to characterize These are defined as follows. distributions. antipodally symmetric

1474

R. Y. LIU AND K. SINGH

DEFINITION. Let H be the distribution of a random variable W on d-dimensional sphere Sd. H is said to be antipodally symmetric (about the origin) if the distributionof (-W) is also H, where (-W) stands for the diametrically oppositepoint of W. If H has a continuousdensity, then antipodal symmetry is equivalent to h(0) = h( - 0) forall 0 on Sd. PROPOSITION 3.3. Assumethath(-) is continuous. Then ASD(6) = c, fora c and all 0 E [0,2wr), positiveconstant if and onlyif h(6) = h( - 0) forall 0. Moreover, theconstant c mustthenbe 1/4.
PROOF. (= ) Suppose ASD(6) = c throughout. Then by Proposition 3.1, either A6 = C6 or h(0) = h( - 0) = 0 holds forevery 0. Thus, h(0) = h( -0) for of h( ). all 0 in viewofthe continuity If h(0) = h(- 6) for all 0, then Aa = Co and d ASD(6)/d6 = O for all (4=)

0, whichimplies ASD(W)= c forsomeconstantc. To showthat c mustbe 1/4, we need onlyshowthatASD(O) = 1/4 since the same argument appliesto a general point0 after a rotation ofthe axes by 0. Due to antipodal symmetry [i.e.,h(0) = h(- 0) forall 0], we have
ASD(0) = 2f(2
-

H(a))h(a)

da.

It suffices to checkthat (3.3) |fH(a)h(a)da


=8

This is doneby letting H(a) = y and converting (2.3) intoJo"2ydy. O Note thatan alternative ofProposition 3.3 can be given[infactunder proof the weaker conditionthat H(-) is continuousonly]using the connecting 3.4. equationin Proposition also holdsforADD(-). The constantc thereis 7/2.
REMARK 3.2. On a circlethis characterization ofASD( ) by the constancy

REMARK 3.3. It maybe of interest to notethatif the underlying distributionhas its probability mass concentrated on a semicircle thenthedepth only, ASD( ) is zero forall pointson the complementary semicircle.

An equationconnecting ASD withSD and its applications. Equation(3.1) forthe rate of change of the ASD was the main tool in our studyforthe circularcase. For the sphere,there does not seem to be any such simple to ofthe sphere to theplanetangent equation.Insteadwe focuson a reduction the sphereat a givenpoint.Such a processis often calledan exponential map. This willallowus to applysome knownproperties of SD on R2. To makethe discussion we beginwiththe construction in the case ofcircles. clearer,

ORDERING DIRECTIONAL DATA

1475

Let H(-) be thedistribution and 0 somefixed There pointon theunitcircle. is a naturallength-preserving the circle(without the point mappinggg from ofarc(6, p), value lg,(9o)I is simply thelength unitcircle (Sp0 - 0), theabsolute on whether in going and the signof go(s) being - or + depends the direction is from0 to f is counterclockwise or clockwise. used to the ) represent H,9( resulting distribution on the tangent line Lo withits entireprobability mass
on (-7r, 7). Let SD,(-) be the simplicialdepth on the tangent line Lo w.r.t.the - 0) to the segment (iT, w)

of the tangent line Lo at 0. For a point (p on the

distribution ASDQ ) and SD,(-) are connected H0(-). The depthfunctions as follows. circle (3.4)
PROPOSITION 3.4 (On thecircle).

then If H(-) is continuous, all 0 on the for

ASD( 0) + ASD- 0) = SD6(O).

PROOF. Let {W1, W2)be a randomsamplefromH(-). Exceptfora null set the following threeeventsare equivalent:
(o E g0(Wg60(W2)},

{W1and W2are on twodifferent sidesofthediagonal joining0 and - 01 and (6 E arc( W1, (3.5) W2)} u (-60 E arc(W1,W2)} The proposition follows from thefactthattheintersection ofthetwoeventsin 0. E (3.5) has probability We now turn to the two-dimensional we let H(-) be a sphere.Similarly distribution and 0 a fixedpointon the unit sphere.Let Po be the tangent plane to the sphereat 0. For a point Sp('p # 0, - 0), consider the greatcircle 0 and p. The plane of this circlecuts Po along a line whichpasses through L0 , whichis just the tangentline to the circleat 0. This means that if we restrict our attention to thiscircle and line Lo,. we are exactly in thesituation ofthe circlediscussedin the previous In particular, paragraph. the constructiondescribed thereappliesand 'p can be mappedintoa pointg(yp)on theline on the sphere, (-7r, 7r).As 'p movessideways it is evident that Loo between the line L0,p will rotateon the plane Po. The spherewithout-0 will be mappedby go intoa disc in P0 withcenter0 (whichis the origin of Po now). As before, we use Ho(-) to denotethe resulting distribution on Po, whichhas its totalprobability mass on the disc centered at the origin0 withradius 7r. The analogof(3.4) forthe spherecan nowbe statedas follows. ous. Then (3.6)
PROPOSITION 3.5 (On the sphere). AssumethatH(-) is absolutely continuASD(0) + ASD(-0)

= SD0(O),

whereSD0( ) is thesimplicialdepthon theplane corresponding toHo(-).

1476

R. Y. LIU AND K. SINGH

PROOF. Fix W1 and W2. The great circlesjoining {W1, 01 and {W2,01, 0) be the smaller splitthe sphereintofourpieces.Let S(W1,W2, respectively, 1.) For piece whichdoes not containW1 and W2.(This existswithprobability anyW3,

0 E A(g0(W1),g9(W2),g0(W3)) ifandonlyif W3E S(W1,W2,0). two mutually The eventW3 E S(W1, W2,0) can be dividedinto the following events: exclusive by the determined (I) Both 0 and W3 lie on one of the two hemispheres joiningW1and W2. greatcircle by the determined (II) Both - 0 and W3 lie on one ofthe twohemispheres joiningW1and W2. greatcircle W2, W3) and (II) to Note that (I) is equivalentto the eventthat 0 EI A,(W1, R E -0 W2, W3).This provesthe assertion. A,(W1, ofantipodally symmetWe nowderive the following simplecharacterization ric distributions on the sphere.
PROPOSITION 3.6 (On the sphere). Assume that h( ) is continuous. Then ASD(O) = 1/8 for all 0 if and only if h(H) = h(- 0) forall 0.

W from suchan so becauseforan observation ASD(O) = ASD( - 0). It is clearly The H( ) the random variables W and - W have the same distribution. about the origin0 on the tangent induceddistribution Ho(-) is symmetric 4 ofLiu (1990)], plane Po. As a result[see Theorem SD>(O) =
4

[i.e., h(0) = h(-0)], then PROOF. (<=) If H( ) is antipodally symmetric

and the resultfollows. 3.5 we have Then by Proposition (=) SupposeASD(0) = 1/8 throughout. it was shownin Liu (1990) thatifthe SD(-) SD0(O) = 1/4 forall 0. However, H0(-) is is equal to 1/4 on the plane at some point,then the distribution H(-) aroundthatpoint.Since thisis trueforall 0, the distribution symmetric Therefore h(O) = h( - 0) forall 0. 1/2 to each hemisphere. assignsprobability 3.4 and 3.5, and they provide from belowfollow Propositions The corollaries respectively. upperboundsforASD( ) on the circleand on the sphere,
3.2 (On the circle). Under the conditions of Proposition 3.4, COROLLARY ASD(O) < 1/2 forevery0 on the circle. The equality holds at a point 00 if and only if the entireprobabilitymass is on a semicircleand H(00) = 1/2. SD,((p) = 2H6(P)[1 - H6(P)]. < 1/2, which holds since The claim is based on the fact that SDG(sP)

ORDERING DIRECTIONAL DATA

1477

COROLLARY 3.3 (On the sphere). Undertheconditions ofProposition 3.5, ASD(O) < 1/4 forany 0 on thesphere.The equality holdsat a point00 ifthe entiredistribution is concentrated on a hemisphere containing00 and the induceddistribution is (-) symmetric around the origin. Ho 3.3 is obtainedby combining Corollary 3.5 withTheorem4 of Proposition Liu (1990). Statistical applications 3.3 and 3.6. The resultofProposiofPropositions tion3.3 (Proposition 3.6) givesriseto a simple testofwhether a givencircular (spherical) distribution is antipodally about the center ofthe circle symmetric (sphere).We mayuse (3.7) on the circleand (3.8) sup ASDn(0) 0 81

sup ASDn(o) -

on the sphereas teststatistics. Largevaluesof(3.7) and (3.8) indicate thatthe distribution is unlikely to be antipodally Needlessto say,theactual symmetric. implementation ofthesetesting ideas wouldrequireknowledge ofthe exactor oftheseteststatistics. approximate distributions sampling Perhapsan approximationof the sampling distribution can be obtainedfromsome resampling procedures, forexample, the bootstrap A different method. method has been the sampling suggested by Fisher(1989) forobtaining distribution underthe null hypothesis of antipodalsymmetry: Use randomreflection the (through ofthe original data set to produce origin) 2n new samples(each ofsize n) and compute2n values fortest statistic based on (3.7) [or (3.8)]. The histogram these 2n values is an approximation of the desiredsampling A distribution. detailedstudyofthe proposed tests(3.7) and (3.8) and theircomparison with Ajne's test[Ajne(1968)] shall be reported elsewhere. 4. Properties of angular Tukey's depth. It may be instructive to consider first the case wherethe underlying distribution H is supported only on a semicircle In this case we can easily relate ATDQ) to (hemisphere). Tukey'sdepthon the line (plane).
PROPOSITION 4.1. If H(-) is a distribution concentrated on a semicircle, say from0 to r, thenATDH( ) assumesthefamiliar on the form ofTukey'sdepth line, namely,for 0 < 0 < 2r,

(4.1)

ATDH(M)

min{H(O), 1 - H(6)).

1478

R. Y. LIU AND K. SINGH

Clearly ATDH( ) vanishes outside the interval [0, r]. Furthermore, ATDJO) = min{HJ(O), 1
-

Hn(0

-)

Considerthe center-outward rankingof data pointsbased on decreasing 4.1 implies thatthe aboveranking coincides with ATDn(*)values. Proposition thecenter-outward on theline,if based upontheusual order statistics ranking the distribution is supported on a semicircle. In thespherical on a hemisphere case we assumethat H( ) is supported S0. Givenanypoint0 on SO,,we consider thestereographic with projection, poleat - 0, from the sphereto the tangent plane at 0. The distribution H( ) on the spheretheninducesa distribution on the tangent plane whichwe denoteby 0 in its interior S containing we can find Ho(-). Now, givenany hemisphere anotherhemisphere, say S', containing 0 in its boundary satisfying P(S') < P(S). We can visualizeS' as follows. Let L be thelineofintersection between the planes supporting the boundariesof S and that of So. Rotate the boundary of S around L as axis untilit passes through 0. One of the two hemispheres thus obtainedwill have probability less than or equal to P(S), and thisis the one we take as S'. This implies the following proposition.
Let H(-) be a distributionsupported on the hemisphere Then the followinghold: (a) ATDH(0) = 0 forall 0 0 So. (b) For any 0 in So. So, ATDH(0) agrees with Tukey's depth (2.5) taken w.r.t. the distribution projectionfrom - 0. Ho(-) induced on the tangentplane at 0 by stereographic We discuss now some more general propertiesof ATD(*).
PROPOSITION 4.3. On the circle as well as on the sphere,ATD( ) is bounded above by 1/2. The value 1/2 is achieued at a point 6 on a circle (0 on a sphere) if and only if each semicircle (hemisphere) containing 6 (0) has probability greater than or equal to 1/2. PROPOSITION 4.2.

In particular, the bound1/2 is achieved at the modeofany member ofthe von Mises class ofdistributions, and everywhere ifthe distribution is uniform.
PROPOSITION 4.4. On the circle (sphere) ATD(-) has the constantvalue 1/2 throughoutif and only if any semicircle (hemisphere) has probability1/2.

can be viewed as an alternative definition ofan antipodally distribusymmetric 4.4 is a characterization of antipodally distribution,Proposition symmetric tions. These observations with constantATD may suggestthat a distribution wouldhave to be antipodally ATD forany and thatthe maximum symmetric distribution is always 1/2. Neitherstatement is true, as is shown in the following example.

Since the propertythat each semicircle (hemisphere) has probability1/2

ORDERING DIRECTIONAL DATA

1479

EXAMPLE 4.1. Let E = 1/28, and let H( ) be a distributionon the unit circle with the densityfunction h(-) definedas + 3E)(w7T/2) (1 4 6(77/4)1,

forO<6
for 7 for 4
< <0

< 0

-w ~~~~2
_

<
<r,

37

(4--2E)(7/4)',
(h)
-

E)(T/2)
2E)(wr/4)1,

for7r for21T

< 0 < 2 < 6

(4 -

<

4-,

E(77-/4)

for 4wT < H< 2 7.

For this distributionATDH(6) = 1/2 - 1/14 for all 6, 0 < 0 < 2w. To check this we note that each one of these three semicircles:fromw/2 to (3/2)7, w to 2wrand (7/4)rr to (3/4)wrhas probability1/2 - 2E. This also turns out to be the minimumprobabilityover all semicircles. This example contrasts with the followingfact related to ASD: The ASD is and is antipodallysymmetric constant on a circle if and only if the distribution the value of the constant is always 1/4 (cf. Proposition 3.3). We now state the key monotonicity propertyof ATD( ). 4.5. Let 00 be a point on the sphere. We introducefor each PROPOSITION point 0 the Euler angle X, 0 < 4 < 7r, which is the angle between 000 and 00, in otherwords the latitude of 0. If we fixa meridian, theposition of 0 will and a densityh on the be characterized by b and its longitude -q,0 < -q < 2wr, sphere is just a functionof (0, -q). Assume we have a distributionwith density h(G, ') which decreases in 0 for each rq,and satisfiesh(b, -q) = h(0, -q + 7r). Then
ATDH(O)

= ATDH((X,

i))

is a monotonicallynonincreasing functionof 0 for each -q. In particular, it attains its maximum 1/2 at 00. PROOF. The argument relies upon the followingobservation: If a hemisphere contains 00 then it has probabilitygreater than or equal to 1/2; if it does not then it has probabilityless than or equal to 1/2. Fix a longitude q. Consider two latitudes 01 and 02 such that 0 < +p < 02 < 7. We claimthat
ATDH((Ol, 77)) <
ATDH((02,

To show this we consider any hemisphere S+ such that it contains (k1, -q)but not (02, -q). Let S- represent the complement which obviously contains (02, -q). Evidently S+ also contains the mode 00. In view of the above observaThe propositionfollows. O tion we obtain P(S+) P(S-).

1480

R. Y. LIU AND K. SINGH

in the circle property We omitthe discussionof the similarmonotonicity case. of ATD, which also Next, we point out a simplebut peculiarproperty ofASD. monotonicity withthe generalstrict contrasts on thesphere (circle),there exists at PROPOSITION 4.6. For anydistribution and thevalue theATD(Q) is constant (semicircle)where least one hemisphere value. is its minimum oftheconstant then with the smallestprobability, In factif we let S be a hemisphere willbe ATD(O) willbe equal to P(S) forany point0 in S. This observation in Section5. usefulforthe comparisons version ofATD also. This 4.6 appliesto theempirical Proposition Evidently, to trim ofangulardata,namely, a naturaltrimming ofATD suggests property ATD. all the data pointswiththe minimum off betweenATD and Finally,we state withoutthe proofthe connection of whichcan be seen as the counterpart medianaxis in the nextproposition, 3.2 forATD. Proposition ASD replacedbyATD. 3.2 holds with 4.7. Proposition PROPOSITION 5. Aspects of robustness. The notion of breakdowndescribed in measureof Rousseeuwand Stahel (1986) is an intuitive Hampel,Ronchetti, of breakdown adaption a natural is in RP. definition The following robustness for is nonzero breakdown the fordirectional functions. Underthis definition all threemedians. of E such that the medianof of this medianas the infimum the breakdown G. distribution HE = (1 - e)H + EG is - 00 forsomecontaminating
on Sd with 00 as a median. We define DEFINITION. Let H be a distribution

establishessome lower the following proposition Under this definition boundsofthe breakdown forthe threemedians. H and G, we have PROPOSITION 5.1. On a circle,forany distributions (i)
(ii) | ASDH-()
ASDH(O)

?< 2?,
? <
? EI',

ATDH (0) - ATDH(0) ADDH'(60) =

and
(iii) whereHE
(1ADDHM(

)H + EG.

and the if 00 is an ASM on thecircle The inequality thefollowing: (i) implies of this higherthan that of - 00, thenthe breakdown depthof 00 is strictly

ORDERING DIRECTIONAL DATA

1481

appliesto ATM and ADM. In fact, medianis nonzero.The same statement 5.1 implies Proposition breakdown ofASM ? ASD(6Q) - ASD( - 00) 4 ofATM ? breakdown and
breakdownofADM > ADD(00) - ADD( -00) ATD(6 0) - ATD(-00)

2w

random variables. r. We assumethattheWi's,Zi's and <js are all independent Define


=Zi

H; {Z1,Z2} from 0; and samplefrom W2}be a random PROOF. (i) Let {W1, withP(Gi = 1) 1 - E and PG7i = O) a Bernoulli distribution h Thl} from

if 77 = O.

We obtain ASDH(0) = P(6 c arc(W,*, W2W))


-

P(6 E are(W*W2f) n (m
+ P(0 E arc(WE* Wt*, ) n
(ij?l

1, 12
1 q

1))
=-1)c)

(1 _ e) ASDH(0) + R,

from the factthat 1 - (1 - 0)2 < 2e. where0 ? R < 2e. The resultfollows (ii) The inequality thatfor on ATD is easilydeducedfrom the observation S, lPH(S) - PH(S)I < r. any semicircle The proof of(iii) is straightforward and is thus omitted. n
REMARK 5.1. In the case of a sphere,the bound forATD and ADD in 5.1 remains the same,and it becomes3e forASD. Proposition

function is anothercomthe influence Besides the notionof breakdown, Rousseeuw See Hampel, ofrobustness. usedtoolfor thestudy monly Ronchetti, of a statistic. and Stahel (1986) forthe description of the influence function ofmostcircular are bounded, estimators Sincetheinfluence functions location Ko and Guttorp bya measure to divide theinfluence function (1988) proposed class of of scale and thentake the supremum overthe circleand a reasonable If thesupremum is bounded, thentheestimator is considered to distributions. be standardized-bias Ko and Guttorp (1988) showthatthe robust (SB-robust). on thevonMisesclass ofdistribudirectional meanis notSB-robust; however, unimodal tions,Mardia's median (or ADD) is SB-robust.For a symmetric

1482

R.Y. LIU ANDK. SINGH

function (IF) of circulardistribution withthe modal angle 1tt0 the influence Mardia's medianis
(5.1) IF(O; Mardia's median) =
- ___)(-

1 sign(O- -o __-?) ) 2 f(AO) - f(--/.L)'

where sign(x) = 1, 0 or -1 as x > 0, x = 0 or x < 0. The result (5.1) is in Wehrly obtained and Shine(1981). It is easyto showthatthethreemedians ASM, ATM and ADM, whentheyare uniquely defined, all have the same form of influence function as (5.1). Thus theyare typically In view of SB-robust. even if Propositions 3.2 and 4.7, the threemediansmaybe uniquelydefined thereare multiple medianaxes and multiple modes(see also Example6.1). It is not surprising that all the circularmediansdiscussedhere give the same influence function. of the threedepth After all, the Euclideanversions in this articlegiverise to the same medianon the real line. Clearly functions medians coincide underunimodal distributions. Howthesecircular symmetric it is not knownto us if they still ever, outside this class of distributions coincide as theydo on the real line. relatedto a Finally,we demonstrate throughan examplea comparison rankingof data points qualitativerobustnessaspect of the center-outward based on the decreasing ADDJ(-) and ASD,( ) [orATDQ( )] value.
EXAMPLE5.1 (On a unit circle). Let the data set be 0, 5,10, 15,100, ADD ordering is 105, 110, 115, 120, (in degrees). The center-outward ifthe data point100 in the data set 100,105,110,115,120,15,5, 0. However, willbe 20, 15, 10, 5, 0, 105, 110, 115, is replaced by 20, thenthe newordering is drastically alteredeven thoughonlyone data 120. Note that the ordering to arise if is likely pointis changed.In fact,eitherofthe two data situations have 50% probability. "both heaps" of the underlying bimodaldistribution This drastic alterationin ordering is due to the fact that ADD ordering based on absentin the case ofthe ordering dependson distanceand is clearly based on the usual ASDn(-) or ATD,( ) [whichare the same as the ordering orderstatistics on the line,namely, 100,(15, 105), (10, 110), (5, 115),(0, 120)].

6. Concluding remarks. For ATD it is 1/2 in both circularcase and the spherical case, respectively. ASD equals 1/4 throughout cases. For antipodally symmetric distributions, a circle as the sphere, whileATD is 1/2 throughout thecircle, 1/8 throughout well as a sphere.In generalthe upperbound forATD is morelikelyto be attainedthanthatofASD. REMARK 6.2. Considerthe densitieswhichare monotonically decreasing from way(e.g.,the von Mises class). In sucha case, 00 to - 00 in a symmetric
REMARK 6.1. The achievable upper bound for ASD is 1/2 and 1/4 in the

ASD decreases monotonicallyfrom 00 to - 00 in each direction;whereas ATD

ORDERING DIRECTIONAL DATA

1483

in each direction and becomesa constant afterwards. decreasesup to halfway equals 1/2, whereas distribution The value ofATD at the modeofa unimodal thatforASD at the mode(on a circle)rangesfrom 1/4 to 1/2. The value 1/2 is concenforASD at the mode occursif and onlyif the entiredistribution value ofATD at the Thus we notethatthe maximum tratedon a semicircle.
mode is the same as its constant value in the case of the uniformdistribution (= 1/2), while the notion of ASD makes a clear distinctionbetweenthe two situations.
REMARK 6.3.

For symmetric unimodaldistributions (e.g., the von Mises ATM all with the The examplegiven mode. coincide class), ADM, ASM and shows that ASD, ATD and ADD may help one pick out the below further "centralmost" pointin the presenceof multiplemedianaxes and multiple modes. follows:
EXAMPLE 6.1.

as Let h( ) be a density function on the unitcircledefined


H
/2-,
T-0

1
forO?<
--'
1

7r 2'
v

h(0)

1 2-,

7r/27

for for

<0<7
-

for Tr < 0

<

2w.

on [0,TrIand uniform In otherwords, thedistribution on (7, 2X). is triangular is to check that there median axes along thetwo It easy are twoperpendicular axes. On the otherhand,thereis a uniquemaximum pointofASD( ), namely, the point 7r/2.This seems more sensiblebecause amongthe fourmedian w and 37/2 suggested candidates by medianaxes, 7r/2standsout as 0, 7w/2, the pointwiththe highest concentration around it. The claimof probability maximization at can be verified 3.1 and the 1r/2 by usingProposition unique fact: following
0E (? 2 ) u (I,27w) AO- Co> 0 for

and AO-Co<0 for0E- 2'


7r 31r

2.

in the ranges 7r/2to 3v/2 Thus, ASD and ADD decrease monotonically clockwise as wellas 7/2 to 37r/2counterclockwise. ATD decreases Similarly, on bothsidesof r/2 withstrict monotonicity between r/2 and monotonically thisrange, ATD staysconstant and assumesitsminimum 7r/2+ w/4.Beyond value.

1484

R. Y. LIU AND K. SINGH

Now, in this examplea new mode can actuallybe createdat w or 0 by modekeeping the maximum altering the density locally, thuscreating another pointofASD( ), ATD0,) ADD(-) and the two medianaxes unaffected.
REMARK 6.4. In comparing the rankings derivedfromthree depths,we withthe linear note that the ranking based on ADD is not even consistent ranking whenthe distribution is on a half-circle (cf.Example5.1). As forASD ATD is and ATD, even thoughtheyboth satisfy this consistency property, withthe smallest all pointslyingon the hemisphere unable to distinguish ASD seemsto probability (cf.Proposition 4.6 and Remark 6.2). In conclusion, whichis providea finerrankingof data points in the orderof centrality, on outliers particularly usefulin detecting [see Collett(1980) fora discussion outliersin circulardata]. On the otherhand, ATD may be expectedto be superiorin terms of the robustnessof the associated "center." Another thanATD, ASD seemseasierto compute advantage ofASD is that,in general, especially forSd, d > 2.

REFERENCES
55 343-354. testforuniformity ofa circular Biometrika AJNE, B. (1968). A simple distribution. BROWN, B. M. (1983). Statistical uses of the spatial median.J. Roy. Statist.Soc. Ser. B 45 in circular data. J. Roy.Statist.Soc. Ser. C 29 50-57. COLLETT, D. (1980). Outliers Ann. Statist.7 1-26. Another lookat thejackknife. EFRON, B. (1979). Bootstrap methods: FISHER, N. I. (1985). Spherical medians. J. Roy.Statist.Soc. Ser. B 47 342-348. FISHER, N. I. (1989). Personal communication. FISHER, N. I., LEWIS, T. and EMBLETON, B. J. J. (1987). Statistical Analysisof SphericalData.
GoWER, J. C. (1974). The median centre. J. Roy.Statist.Soc. Ser. C 23 466-470. rulesbased on concepts ofdata depth.Unpublished GRoss, S. and LIU,R. (1989). Classification

25-30.

Cambridge Univ.Press.

manuscript. F. R., RONCHETTI, E. M., ROUSSEEUW, P. J. and STAHEL, W. A. (1986). Robust Statistics: NewYork. TheApproach Based on Influence Functions. Wiley, Ann. Statist. HE, X. and SIMPSON, D. G. (1992). Robust direction estimation. 20 351-369. Ko, D. and GUTTORP, P. (1988). Robustness of estimators fordirectional data. Ann. Statist.16 609-618. Liu, R. (1988). On a notion ofsimplicial depth.Proc. Nat. Acad. Sci. U.S.A. 85 1732-1734. Liu, R. (1990). On a notion ofdata depthbased on random simplices. Ann. Statist.18 405-414. MARDIA, K. V. (1972). Statistics ofDirectional Data. Academic, NewYork. formultivariate Canad. and directional distribution. SMALL, C. G. (1987). Measuresofcentrality J. Statist.5 31-39. J. W. (1975). Mathematics and the picturing ofdata. In Proceedings oftheInternational TuKEY, Congress ofMathematicians, Vancouver, 1974 (R. D. James, ed.) 2 523-531. Canadian Mathematical Congress. on Spheres. NewYork. WATSON, G. S. (1983). Statistics Wiley, fordirectional curvesof estimators data. WEHRLY, T. E. and SHINE, E. P. (1981). Influence Biometrika 68 334-335.
HAMPEL, DEPARTMENT OF STATISTICS HILL CENTER, BUSCH CAMPUS RUTGERS UNIVERSITY NEW BRUNSWICK,NEW JERSEY 08903

Das könnte Ihnen auch gefallen