Sie sind auf Seite 1von 11

J. R. Statist. Soc.

A (2004)
167, Part 4, pp. 627–637

Bayesian binary segmentation procedure for


detecting streakiness in sports

Tae Young Yang


Myongji University, Yongin, Korea

[Received May 2002. Revised July 2003]

Summary. When an individual player or team enjoys periods of good form, and when these
occur, is a widely observed phenomenon typically called ‘streakiness’. It is interesting to assess
which team is a streaky team, or who is a streaky player in sports. Such competitors might have a
large number of successes during some periods and few or no successes during other periods.
Thus, their success rate is not constant over time. We provide a Bayesian binary segmenta-
tion procedure for locating changepoints and the associated success rates simultaneously for
these competitors. The procedure is based on a series of nested hypothesis tests each using
the Bayes factor or the Bayesian information criterion. At each stage, we only need to com-
pare a model with one changepoint with a model based on a constant success rate. Thus, the
method circumvents the computational complexity that we would normally face in problems with
an unknown number of changepoints. We apply the procedure to data corresponding to sports
teams and players from basketball, golf and baseball.
Keywords: Bayes factor; Bayesian information criterion; Binary segmentation procedure;
Changepoints; Streaky player

1. Introduction
There has been considerable interest in detecting a streaky player or a streaky team in many
sports including baseball, basketball and golf. The opposite of a streaky competitor is a player
or team with a constant rate of success over time. A streaky player is different, since the asso-
ciated success rate does not stay constant over time. Streaky players might have a large number
of successes during one or more periods, with fewer or no successes during other periods. More
streaky players tend to have more changepoints. We apply a Bayesian binary segmentation proce-
dure that was proposed by Yang and Kuo (2001) to locate the changepoints and the associated
success rates simultaneously in binomial sports data. The procedure is based on a sequence
of nested hypothesis tests each using the Bayes factor or the Bayesian information criterion
(BIC) approximation. At each stage, we only need to compare a single streaky model with one
changepoint with a constant model with no changepoints. Therefore, the procedure is easily
implemented and circumvents the computational complexity that we would normally face in
problems with a variable number of changepoints. We illustrate the procedure by considering
data arising from sports teams and players from the National Basketball Association (NBA),
the Professional Golfers’ Association and Major League baseball (MLB).
The Bayesian binary segmentation procedure is applicable to binomial data based on n con-
secutive independent observations. In the first step, we compare a constant model and a single
changepoint model by using the Bayes factor or the BIC approximation. We assume that the
Address for correspondence: Tae Young Yang, Department of Mathematics, Myongji University, Yongin,
Kyunggi, 449-728 Korea.
E-mail: tyang@mju.ac.kr

 2004 Royal Statistical Society 0964–1998/04/167627


628 T. Y. Yang
single changepoint c is an integer random variable with range [1, n − 1] and we locate a tentative
changepoint ĉ by using numerical integration or a sampling-based approach. If the test is in
favour of the constant model, we estimate the success rate based on the whole data [1, n], stop
the procedure and conclude that the rate is constant. If not, then we divide the data into two
subsegments: one denoted by [1, ĉ] and the other denoted by [ĉ + 1, n]. Then, we run two Bayes
factor tests, or two tests based on the BIC approximation as before on each of the subsegments.
If the test suggests that there are no changepoints in a subsegment, we immediately estimate
the constant success rate in the subsegment. If the test suggests a streaky model, we locate the
changepoint and continue splitting the data. We continue testing until no more changepoints are
found. Using this procedure we need only to compare a constant model with no changepoints
with a single streaky model with one changepoint. Therefore, the procedure is straightforward to
implement. Moreover, when we determine that there is no changepoint in a subsegment, we no
longer need to consider data from that subsegment. This cuts the sample size down significantly
for locating changepoints in the remaining regions.
Vostrikova (1981) proposed a binary segmentation procedure and proved its consistency for
locating the number of changepoints in a multidimensional random process. Chen and Gupta
(1997) proposed a binary procedure with the BIC to locate multiple-variance changepoints in
a sequence of independent Gaussian random variables with known common mean. Yang and
Kuo (2001) proposed the Bayesian binary segmentation procedure for locating changepoints
and the associated rates of event times taken from a Poisson process.
Albright (1993) performed several statistical tests to detect streakiness in a number of baseball
data sets. Barry and Hartigan (1993) proposed a particular streaky model to describe a baseball
team’s sequence of wins and losses. Albert and Williamson (2001) proposed a Markov switching
model that can be used to model streakiness in baseball and basketball, and a simulation-based
approach was proposed for approximating a Bayesian analysis. There have been other contri-
butions involving streakiness in sports: see, for example, Stern (1997), Stern and Morris (1993),
Tversky and Gilovich (1989) and Larkey et al. (1989).
The outline of this paper is as follows: Section 2 describes the Bayesian binary segmenta-
tion procedure for detecting streakiness in binomial data. Section 3 applies the binary segmen-
tation procedure to several data sets arising in sports. Section 4 offers some concluding
remarks.

2. Methodology for detecting streakiness of binomial data


We assume that the success rate for binomial data at time t changes according to

K+1
p.t/ = I.t ∈ [ck−1 + 1, ck ]/pk
k=1

where I.E/ is the indicator function of event E and 0 = c0 < c1 < . . . < cK < cK+1 = n are the
unknown integer-valued changepoints with associated success rates p1 , . . . , pK+1 . The goal of
the classical changepoint problem is to identify the number of changepoints K, the change-
points c1 , . . . , cK and the associated success rates p1 , . . . , pK+1 . Using the binary segmentation
procedure we cut down on the complexity of the problem by finding one changepoint at a time.

2.1. Testing using the Bayes factor and the Bayesian information criterion approximation
We observe a sequence of independent binomial data D = {x1 , . . . , xn }, where xi denotes the
number of successes in mi trials. We let M0 denote the constant model with no changepoints
Bayesian Binary Segmentation Procedure 629
(i.e. θ0 = p1 = . . . = pn ). Under M0 , the likelihood is
  n

n m
j Σ xi n
L0 .θ0 |D/ = θ0 i=1 .1 − θ0 /Σi=1 .mi −xi / : .1/
j=1 xj

Let M1 denote the single-changepoint model with the changepoint given by the parameter
c. This implies θ1 = p1 = . . . = pc < pc+1 = . . . = pn = θ2 where c = 1, . . . , n − 1. Under M1 ,
the likelihood is
  c

n m
j Σ xi c Σn xi n
L1 .c, θ1 , θ2 |D/ = θ1 i=1 .1 − θ1 /Σi=1 .mi −xi / θ2 i=c+1 .1 − θ2 /Σi=c+1 .mi −xi / : .2/
j=1 x j

We develop two procedures for comparing M1 versus M0 . The first procedure is based on
calculating the Bayes factor. The second procedure is based on the BIC approximation.

2.1.1. Bayes factor


A natural criterion for selecting model M1 is based on the posterior odds ratio, i.e. we select M1
if pr.M1 |D/=pr.M0 |D/ > 1. Note that
pr.M1 |D/ pr.M1 / pr.D|M1 /
=
pr.M0 |D/ pr.M0 / pr.D|M0 /
= prior odds ratio × B10
where B10 is known as the Bayes factor of M1 versus M0 . Assuming a priori that M0 and M1 are
equally likely, our model selection criterion reduces to selecting M1 if B10 > 1.
Given model M1 , we assume that the prior densities on θ1 , θ2 and c are independent with
θi ∼ beta.αi , βi /, i.e.
Γ.αi + βi / αi −1
π.θi / = θ .1 − θi /βi −1 ,
Γ.αi / Γ.βi / i
i = 1, 2, and p.c/ = 1=.n − 1/, c = 1, . . . , n − 1. The conjugate beta prior for θi is chosen so that
we can easily specify the Bayes factor. Beta priors are also sufficiently versatile to incorporate
various shapes for the distributions of the unknown parameters. Therefore, the numerator of
the Bayes factor is given by
    
 1 1
n−1 n m
j Γ.α1 + β1 / Γ.α2 + β2 /
pr.D|M1 / = L1 .c, θ1 , θ2 |D/π.θ1 , θ2 , c/dθ1 dθ2 =
c=1 0 0 j=1 xj Γ.α1 / Γ.β1 / Γ.α2 /Γ.β2 /
 c
 c   n
 n
Γ xj + α1 Γ .mj − xj / + β1 Γ xj + α2 Γ .mj − xj / + β2
1 n−1 j=1 j=1 j=c+1 j=c+1
× c   n :
n − 1 c=1
Γ mj + α1 + β1 Γ mj + α2 + β2
j=1 j=c+1

Similarly, given model M0 , we assume that θ0 ∼ beta.α0 , β0 / and the denominator of the Bayes
factor is given by
 1
pr.D|M0 / = L0 .θ0 |D/π.θ0 / dθ0
0

n

n
   Γ xj + α0 Γ .mj − xj / + β0

n m
j Γ.α0 + β0 / j=1 j=1
= 
n :
j=1 xj Γ.α0 / Γ.β0 /
Γ mj + α0 + β0
j=1
630 T. Y. Yang
If the Bayes factor B10 < 1, we accept M0 and we estimate θ0 using the posterior mean under
M0 :
n  
n
θ̂0 = xj + α0 mj + α0 + β0 : .3/
j=1 j=1

Otherwise, we accept M1 and estimate c using the posterior mean


ĉ = E.c|D/ = A=B, .4/
where

c

c  
n

n
Γ xj + α1 Γ .mj − xj / + β1 Γ xj + α2 Γ .mj − xj / + β2

n−1 j=1 j=1 j=c+1 j=c+1
A= c 
c  
n
c=1 Γ mj + α1 + β1 Γ mj + α2 + β2
j=1 j=c+1

and

c

c  
n

n
Γ xj + α1 Γ .mj − xj / + β1 Γ xj + α2 Γ .mj − xj / + β2

n−1 j=1 j=1 j=c+1 j=c+1
B= 
c  
n :
c=1 Γ mj + α1 + β1 Γ mj + α2 + β2
j=1 j=c+1

If ĉ from equation (4) is not integer, we take the nearest integer to ĉ. Note that, instead of using
the posterior mean of c to estimate the changepoint, we could use the posterior mode.

2.1.2. Bayesian information criterion approximation


The BIC procedure is a little simpler. Let us consider the constant model M0 . The likelihood
function (1) is maximized by

n n
θ̃0 = xj mj ,
j=1 j=1

giving L0 .θ̃0 |D/. For the single-changepoint model M1 , the likelihood (2) is maximized along
the contour at c = 1, . . . , n − 1 via

c 
c 
n 
n
.θ̃1 .c/, θ̃2 .c// = xi mi , xi mi :
i=1 i=1 i=c+1 i=c+1

The fully maximized likelihood under the single-changepoint model L1 {c̃, θ̃1 .c̃/, θ̃2 .c̃/|D} is then
obtained by maximizing L1 {c, θ̃1 .c/, θ̃2 .c/|D} over the finite set c = 1, . . . , n − 1.
We choose between models M0 and M1 according to the BIC that was proposed by Schwarz
(1978). We define
BIC10 = log[L1 {c̃, θ̃1 .c̃/, θ̃2 .c̃/|D}] − log{L0 .θ̃0 |D/} − 21 .q1 − q0 / log.n/
where the final term is a penalty function which adjusts for the difference in dimensionality
between the two models. In this application, q1 = 3 and q0 = 1. If BIC10 is negative, the deci-
sion is to accept M0 . If BIC10 is positive, we reject the constant model and estimate the first
changepoint by c̃. Then we follow the binary segmentation procedure for the next step as given
in Section 2.2.
Bayesian Binary Segmentation Procedure 631
2.2. Bayesian binary segmentation procedure
We note that the order in which subsegments are divided does not affect the subsequent inference.
The overall procedure using the Bayes factor criterion is as follows.

2.2.1. Level 1 analysis


We follow the procedure in Section 2.1.1 using D to test the constant model M0 versus the
single-changepoint model M1 . If we decide on the constant model, then we estimate the con-
stant success rate to be p̂.t/ = p̂1 I.t ∈ [1, n]/ with
 n  n
p̂1 = xj + α0 mj + α0 + β0
j=1 j=1

as in equation (3) and stop. If not, we continue to the level 2 analysis.

2.2.2. Level 2 analysis


We estimate the changepoint ĉ by using equation (4) and divide D into two parts, [1, ĉ] and
[ĉ + 1, n]. Then, the simple test of a single-changepoint model is carried out on each of the two
subsegments. For the test of the first or second subsegment, we need to replace D with the data
corresponding to respectively [1, ĉ] or [ĉ + 1, n] and to change n to ĉ or n − ĉ. If both of these tests
select the constant models, then we stop the procedure and estimate the constant success rate
 ĉ   ĉ
p̂1 = xj + α0 mj + α0 + β0
j=1 j=1

for the segment [1, ĉ] and


 
n  
n
p̂2 = xj + α0 mj + α0 + β0
j=ĉ+1 j=ĉ+1

for the segment [ĉ + 1, n], i.e. we estimate the success rate to be p̂.t/ = p̂1 I.t ∈ [1, ĉ]/ + p̂2 I.t ∈
[ĉ + 1, n]/. If any of the tests suggests that there is a changepoint, then we would proceed to the
next level.

2.2.3. Continue testing


We would estimate the changepoint as in equation (4) with appropriate changes. Then we would
continue testing until no more splitting is allowed. Any time that a null model is determined, we
would estimate the constant rate at that region to be .α0 + current number of successes/=.α0 +
β0 + current number of trials/ and cease further testing in the subregion.
For the BIC, we follow the above procedure with the criterion given in Section 2.1.2. Every
time that model M0 is selected, we estimate the associated constant rate to be (current number
of successes)/(current number of trials).

3. Numerical examples
For Sections 3.1–3.4, we consider relatively diffuse beta priors θi ∼ beta.1, 1/ for the success
rates, i = 0, 1, 2. Together with the discrete uniform prior for the changepoint c, this yields an
analysis which is more focused on the likelihood.

3.1. The assertion that Golden State Warriors were a streaky team during the
National Basketball Association 2000–2001 season
The NBA is divided into four divisions: the Atlantic with seven teams, the Central with eight
632 T. Y. Yang
Table 1. Bayes factor B10 and the BIC approximation BIC10 for
the first iteration of the binary segmentation procedure applied
to the NBA teams during the 2000–2001 regular season

Team and division Record (win–loss) B10 BIC10

Philadelphia, Atlantic 56–26 0.68 −0:28


Washington, Atlantic 19–63 0.35 −3:54
Milwaukee, Central 52–30 0.63 −6:49
Chicago, Central 15–67 0.64 −0:89
San Antonio, Midwest 58–24 0.50 −2:62
Vancouver, Midwest 23–59 0.69 −1:27
Los Angeles, Pacific 56–26 0.47 −1:16
Golden State, Pacific 17–65 2.24 0.55

teams, the Midwest with seven teams and the Pacific with seven teams. All teams played 82
games during the 2000–2001 NBA regular season starting from October 31st, 2000, to April
18th, 2001. The winner of the division is the team which wins the most games in the divi-
sion. We apply the binary segmentation procedure for detecting streakiness based on sequences
of wins and losses for the winner of the division and the worst team in each division. Each
data set therefore consists of an 82-game sequence of Bernoulli trials. Table 1 provides the
results from the first iteration of the binary segmentation procedure using the B10 - and BIC10 -
criteria. We observe that only data based on the Golden State Warriors indicates a split in
favour of model M1 . Looking at the data in more detail, Golden State had only three wins
including a 13-game losing streak after the all-star break around the 48th game. The cumu-
lative number of wins for Golden State is plotted against the game number in Fig. 1. The
graph indicates that there may be different patterns between games before and after the all-star
break.
Table 2 presents the step-by-step results of the full binary segmentation procedure using the
Bayes factor criterion B10 for splitting the data corresponding to Golden State. The procedure
begins in step 1 by identifying the first candidate changepoint. The value is the 46th game and
this tentatively divides the full data [1, 82] into two subsegments [1, 46] and [47, 82]. The cal-
culated B10 -value for this split is 2.24, and since this is greater than 1 the split is accepted. In
step 2, the first subsegment [1, 46] is further divided according to the candidate changepoint
given by the 21st game. The corresponding B10 is 0.57 and the split is rejected. In step 3, the
subsegment [47, 82] is divided according to the candidate changepoint given by the 58th game.
This time, B10 is 0.64 and the split is rejected. At the completion of the algorithm, the groupings
according to games are [1, 46] and [47, 82] with the estimated win probabilities 0:31 and 0:10
respectively. From the Bayes factor criterion, we conclude that Golden State was a streaky
team; the performance before early February 2001 was stronger than during the remainder of
the season.
Next, we carry out the full binary segmentation procedure using the BIC. For the complete
data [1, 82], BIC10 is 0.55 .> 0/ and the estimated changepoint is the 42nd game. Subdividing
the two subsegments then gives BIC10 -values of −2:1 and −2:9 respectively, thus terminat-
ing the algorithm. The associated winning rates in the first and second segments are 0.33 and
0.88 respectively. From the BIC, we therefore have results that are similar to the Bayes factor
approach. The overall win rate and the estimated win rates using the Bayes factor and the BIC
approximation are plotted in Fig. 2.
Bayesian Binary Segmentation Procedure 633

..............
............
.........

15
......
CUMULATIVE WIN NUMBERS ..
.
.......
..
10

.
..
..
......
...
5

..
...
..
........

0 20 40 60 80

GAME NUMBER
Fig. 1. Cumulative wins for Golden State Warriors during the 2000–2001 regular season

Table 2. Step-by-step results of the binary


segmentation procedure using the Bayes
factor criterion for splitting the data corres-
ponding to Golden State

Step Data split in games B10

0 [1,82]
1 [1,46] [47,82] 2.24†
2 [1, 21]  [22, 46]  [47, 82] 0.57
3 [1, 46] [47, 58] [59, 82] 0.64

†Final state of the binary segmentation pro-


cedure.

3.2. The assertion that Tiger Woods of the Professional Golfers’ Association was
a streaky golfer during September 1996–June 2001
Tiger Woods has been one of the most prolific golfers in golf history, and the first ever to win all
four professional major championships consecutively. Woods turned professional at the Greater
Milwaukee Open in September 1996 and played 112 tournaments, winning 31 championships
in the period September 1996–June 2001. Let xi = 1 and xi = 0 according to respectively whether
Woods won or lost the ith tournament. Then the data are expressed as the following Bernoulli
sequence of xi s with mi = 1:
634 T. Y. Yang

0.35
0.30
0.25
WIN RATE

0.20
0.15
0.10

0 20 40 60 80
GAME NUMBER
Fig. 2. Comparison of the overall win rate of Golden State Warriors during the 2000–2001 season ( )
with the estimated win rate from the Bayes factor (—Å—) and the estimated win rate from the BIC (—ı—)

0000101000100000110000100000000000000000
1000000000001000100000001010101111101100
01000110101110000100000011101100:
It is interesting to observe the pattern of championships in the six years. The sequence appears
to indicate that Woods in recent days is better than at the beginning. We illustrate the Bayesian
binary segmentation procedure for detecting streakiness. For the complete data [1, 112], B10 is
20.4 and the estimated changepoint is the 62nd tournament which is the Masters Tournament
in April 1999. Subsequent splitting on the two intervals yields B10 -values of 0.5 and 0.7, thus
terminating the algorithm. The winning rates are 0.16 and 0.47 respectively.
Next, we fit the data by using the BIC. For the complete data, BIC10 is 3.4 and the estimated
changepoint is the 64th tournament, the GTE Byron Nelson Classic in May 1999. Subsequent
splitting of the two intervals yields BIC10 -values −2:10 and −2:0 respectively, thus terminating
the algorithm. From both of the procedures, we conclude that Woods is a streaky golfer. His win
probability of a tournament before late spring 1999 was around 0.14, and his win probability
afterwards has increased significantly to around 0.47.

3.3. The assertion that Barry Bonds of Major League baseball was a streaky
home run hitter during April–July 2001
Barry Bonds of the San Francisco Giants reached 40 home runs during the first 87 games of
the 2001 MLB season. We are concerned with whether there is a streakiness to his home run
hitting pattern.
The observed sequence of xi home runs in mi batting attempts for i = 1, . . . , 87 during April–
July is
Bayesian Binary Segmentation Procedure 635
1.3/, 0.4/, 0.4/, 0.4/, 0.4/, 0.5/, 0.5/, 1.3/, 1.3/, 1.4/, 1.3/, 1.4/, 1.4/, 0.3/, 1.3/,
0.1/, 0.2/, 1.2/, 0.4/, 1.4/, 0.2/, 1.4/, 0.1/, 1.5/, 1.3/, 1.2/, 0.3/, 0.1/, 0.2/, 0.2/,
0.3/, 1.3/, 0.3/, 0.4/, 0.4/, 0.4/, 1.3/, 1.3/, 3.5/, 2.2/, 1.4/, 1.3/, 0.4/, 1.3/, 0.2/,
1.3/, 0.4/, 0.5/, 2.3/, 1.2/, 0.3/, 0.3/, 1.2/, 1.3/, 1.4/, 0.2/, 0.3/, 0.3/, 1.4/, 0.4/,
1.3/, 2.3/, 0.2/, 0.3/, 1.5/, 1.4/, 0.2/, 1.3/, 0.4/, 0.2/, 0.4/, 0.5/, 0.1/, 0.1/, 0.4/,
0.3/, 0.4/, 0.3/, 0.4/, 0.2/, 0.5/, 1.5/, 0.5/, 0.4/, 0.4/, 0.1/, 0.2/:
We note that Bonds had only one home run in his last 19 games, which seems to indicate a
different success rate compared with the initial 68 games.
For the complete data [1, 87], B10 and BIC10 are 4.0 and 3.0 respectively and the estimated
changepoints are the 67th and 68th game. We divide the data into the first subsegment [1, 67]
and [1, 68] for B10 and BIC10 respectively and the second subsegment [68, 87] and [69, 87]. The
algorithms both terminate at the B10 - and BIC10 -values for the first and second segments at
0.32 and −1:0, and 0.15 and −1:8 respectively. For the first subsegment, the estimated home
run rate per one at-bat is 0:18 and 0:18 respectively. For the second subsegment, the estimated
home run rate per one at-bat is 0:03 and 0:02 respectively. From the criteria, we conclude that
Bonds had a change in home run performance.

3.4. The assertion that Javy Lopez of Major League baseball was not a streaky
hitter during the 1988 season
Albert and Williamson (2001) graphed the number of hits and the number of at-bats for each
of the 131 games that Javy Lopez played during the 1998 season. In addition, they graphed the
moving batting average of Lopez against game number using a window of 10 games. The graph
showed that Lopez’s hitting was consistent for the first 40 games, he was a hot hitter during
the next 20 games and then he oscillated between poor hitting and good hitting until the end
of the season. From the graph, Lopez may appear to be a streaky hitter, but was he really? We
apply the Bayesian binary segmentation procedure to the Lopez data. The Bayes factor B10 for
the complete data is 0:25 and BIC10 is −2:72. Using both of the statistics, the constant model
is selected, and the estimated overall hitting rate is 0.29 for the Bayes factor and 0.29 for the
BIC approximation. Both of the procedures found that Lopez was not a streaky hitter during
the 1998 season. Observed streakiness in the graph may be a result of misunderstanding the
patterns that are inherent in random sequences. Our conclusion is consistent with that of Albert
and Williamson (2001).

3.4.1. Simulation
We use the International Mathematical and Statistical Libraries’ routine RNBIN to generate
random hits from a binomial density with the same number of at-bats as in Javy Lopez’s 1998
MLB season. To investigate the performance of the binary segmentation procedure, we set

p.t/ = 0:35I.t ∈ [1, 44]/ + 0:2 I.t ∈ [45, 86]/ + 0:4 I.t ∈ [87, 131]/:

There are two changepoints, at the 44th game and the 86th game, and three associated hitting
rates: 0.35, 0.20 and 0.40. The simulated data are plotted in Fig. 3, where the game number is
plotted against xi hits with mi at-bats.
We analyse the data by using the Bayes factor approach. For the complete data in [1, 131],
a changepoint is accepted at the 86th game (B10 = 2:4). The first subsegment [1, 86] is then
subdivided according to [1, 47] and [48, 86] (B10 = 26:6). Subdividing the two subsegments then
636 T. Y. Yang

6
5
4
AT-BATS
3
2
1
0

0 10 20 30 40 50 60
6
5
4
AT-BATS
3
2
1
0

60 80 100 120
Fig. 3. Simulated hits (ı) with number of at-bats (j)

gives B10 -values of 0.46 and 0.46 respectively, thus terminating the algorithm. The estimated
hitting rate is 0:34 and 0:46 respectively. For the second subsegment [87, 131], B10 is 0:35 and
the estimated hitting rate is 0:42. Therefore the Bayes factor has found two changepoints, at the
47th and 86th games, with associated hitting rates 0.34, 0.18 and 0.42. This agrees fairly well
with the underlying process.
Next, we fit the data by using the BIC approximation. In this case, changepoints are obtained
at the 44th and 86th games with associated hitting rates 0.35, 0.16 and 0.42. This also agrees
fairly well with the underlying model.

4. Conclusion
Our simulation study suggests that the Bayesian binary segmentation procedure yields
satisfactory results for identifying changepoints and the associated rates. We applied the proce-
dure to sports teams and individuals from basketball, golf and baseball. The procedure also can
be applied in the detection of streakiness in many other sports including hockey and football.
In this paper, we think of streakiness as a change in performance. Injuries and transfers of
key personnel on a team, for instance, may be good reasons for such a change. The procedure
Bayesian Binary Segmentation Procedure 637
indicates that there is little evidence for streakiness in the Lopez hitting data even though Lopez
may appear to be a streaky hitter. This conclusion is consistent with Albert and Williamson
(2001). The procedure also indicates that the Golden State Warriors of the NBA, Tiger Woods
of the Professional Golfers’ Association and Barry Bonds of MLB all exhibited aspects of
streakiness.

Acknowledgements
I am very grateful for constructive comments from Tim Swartz, and also thank the Joint Editor
and referees for their helpful comments.

References
Albert, J. and Williamson, P. (2001) Using model/data simulations to detect streakiness. Am. Statistn, 55, 41–50.
Albright, C. (1993) A statistical analysis of hitting streaks in baseball. J. Am. Statist. Ass., 88, 1175–1183.
Barry, D. and Hartigan, J. D. (1993) Choice models for predicting divisional winners in major league baseball.
J. Am. Statist. Ass., 88, 766–774.
Chen, J. and Gupta, A. K. (1997) Testing and locating variance change-points with application to stock prices.
J. Am. Statist. Ass., 92, 739–747.
Larkey, P., Smith, R. and Kadane, J. (1989) It’s okay to believe in the ‘Hot Hand’. Chance, 2, 22–30.
Schwarz, G. (1978) Estimating the dimension of a model. Ann. Statist., 6, 461–464.
Stern, H. S. (1997) Judging who’s hot and who’s not. Chance, 10, 40–43.
Stern, H. S. and Morris, C. N. (1993) Comment on a paper by Albright. J. Am. Statist. Ass., 88, 1189–1194.
Tversky, A. and Gilovich, T. (1989) The cold facts about the ‘Hot Hand’ in Basketball. Chance, 2, 16–21.
Vostrikova, L. J. (1981) Detecting ‘disorder’ in multidimensional random processes. Sov. Math. Dokl., 24, 55–59.
Yang, T. Y. and Kuo, L. (2001) Bayesian binary segmentation procedure for a Poisson process with multiple
changepoints. J. Comput. Graph. Statist., 10, 772–785.

Das könnte Ihnen auch gefallen