Beruflich Dokumente
Kultur Dokumente
Department of New Media Art, Taipei National University of the Arts, Taipei, Taiwan
Center for Art and Technology, Taipei National University of the Arts, Taipei, Taiwan
c
Research Center for Information Technology Innovation, Academia Sinica, Taipei, Taiwan
d
Institute of Information Science, Academia Sinica, Taipei, Taiwan
e
Department of Computer Science, University of Illinois-Urbana Champaign, IL
f
Electrical Engineering and Computer Science, Massachusetts Institute of Technology, MA
Electronic Engineering, National Taiwan University of Science and Technology, Taipei, Taiwan
b
ABSTRACT
Sports video analysis has attracted great attention in recent
years. In the past decade, numerous sports video indexing
approaches have been proposed at different semantic levels. In
this paper, an individual level sports video indexing (ILSVI)
scheme is proposed. The individual level refers to the indexing of a sports video on a player basis, i.e. to recognize each
player in a multi-player game. Since the jersey number is always worn by a player as the players identity in a game, it is
feasible to recognize jersey numbers for individual level indexing in sports videos. To solve the jersey number recognition
problem, a principal-axis based contour descriptor is proposed.
Compared to the state-of-the-art approaches, the proposed descriptor can achieve higher recognition rate and only consume
much less computation power. In addition, we developed an
interactive system to realize the individual level sports video
indexing (ILSVI). This interactive system includes a player
detection and a jersey number detection sub-systems. The interactive system can help complete the individual level sports
video indexing task. We shall use basketball game videos as
the basis to develop real-world systems.
Fig. 1. The whos who concept of our detecting and identifying individual players in a sports video, where the location
of a player, Bryant, is depicted by the green rectangle, and
the players related information (extracted from external information sources), such as his player statistics in the past two
weeks, is shown in the tables alongside with the video.
a new class of multimedia applications, such as the interactive video experience with specic players and the automatic
generation of player statistics [7].
To achieve the individual level indexing for a multi-player
sports game, Bertini et al. [8] proposed to recognize the face
as a players identity in soccer videos. However, a face-based
method usually needs the support of close-up shots with video
frames containing frontal faces. Lu et al. [7] developed a supervised learning scheme to distinguish players by tracking a
players blobs based on the conditional random elds (CRF),
but it takes more than 200 hours to manually label sufcient
training samples (9800 frames) for one single game. In addition to the faces and blobs taken from sports players, the jersey
number is a good choice for distinguishing the identity of a
player. Usually, the jersey number is worn by a player in a
multi-player sports game and it is denitely a great candidate
for individual level indexing. To recognize the jersey number
worn by a player is not an easy task since the player moves
all the time during the game play. In [9], Wang and Belongie
proposed a training based approach with HOG features to recognize the text characters in real scene photos. However, the
characters on a scene photo do not move and the performance
of their work relies on the training samples used. In [10], Li
Introduction
937
and Tan proposed a cross ratio spectrum based scheme for real
scene charcater recognition. They adopted the dynamic time
warping scheme to perform matching. Their method is better
than the ones using SIFT [11] and Shape Context [12] in terms
of recognition performance. However, it is well known that the
dynamic time warping method is very time consuming and it
is not feasible to use it if one would like to deal with real-time
applications.
To solve the above mentioned jersey number recognition
problem, a principal-axis based contour descriptor is proposed.
Compared to the state-of-the-art approaches, such as [10], the
proposed descriptor can achieve higher recognition rate and
only consume much less computation power. In addition, we
developed an interactive system which included a player detection and a jersey number detection sub-systems to assist
the individual level sports video indexing task. In this paper,
we shall use basketball game videos as the basis to develop
related systems. Figure 1 is a conceptual illustration of the intended individual-player level indexing system. Once a player
is detected, the system will provide all related information of
this player (automatically extracted from external information
sources).
The rest of this paper is organized as follows. In Sec. 2,
the framework of the proposed ILSVI system and the kernel
techniques of the system components are presented. Next, the
jersey number recognition based on a principal axis contour
descriptor is presented in Sec. 3. The experimental results are
reported in Sec. 4. Finally, the conclusions and future work
are given in Sec. 5.
938
j +j
j j
(1)
Jersey number recognition plays the key role for player recognition in our ILSVI system. In this paper, a principal axis based
contour descriptor is proposed for jersey number recognition.
To perform text recognition in a real-world scene is a challenging problem. The method proposed by Li and Tan [10] can
recognize real scene characters that have serious perspective
deformation on sign boards. However, in sports videos, the
jersey numbers are not only severely distorted, they also suffer
from the shadowing and twist problems. The severe deformation problem makes the jersey number recognition process
very difcult. In addition, the time consumed in the jersey
number recognition process is another critical issue. Because
a player in a sports video often moves quickly, the viewable
jersey number in a video can only last for a few frames (the
results are reported in [7]). Thus, a frame-by-frame jersey
number recognition process is indispensable because it can
maximize the chance of getting correct jersey numbers. Since
the number of players in each frame can be multiple, a jersey
number recognition system that needs low computation cost is
desperately needed.
Once a jersey number is detected, one can convert it into a
binary image, as shown in Fig. 6 (a). The exterior boundary
points can be traced and marked from the binary image of a
jersey number. As shown in Fig. 6 (b), the white contour is
composed of the contour points of the exterior boundary. To
recognize the jersey numbers in a sports video, we propose
to calculate the principal axis rst and then use a contour
descriptor to globally describe a jersey number.
Player Detection
(2)
3.1
939
Fig. 6. The generation of a contour descriptor: (a) the binarized character, (b) the contour points, (c) the principal axis
(red line), starting point (xs , y s ) (green circle), mid point
(xm , y m ) (blue circle), and centroid point (xc , y c ) (yellow
dot), and (d) the contour descriptor (blue curves).
(3)
The line equation shown in Eq. (3) is the line that has the
minimum total distance to all contour points. The intercept
parameter 0 and the slope parameter 1 are calculated based
on the conventional linear regression algorithm [20]. It is
known that the principal axis has the minimum total distance
to the contour points. Therefore, when a jersey number rotates, its corresponding principal axis (regression line) needs
to maintain the minimum distance characteristic. Fig. 7 (a)
shows a digit 2 and Fig. 7 (b) illustrates a rotated 2. It is
obvious that when a digit rotates, its corresponding principal
axis also rotates in the same manner. Therefore, the principal
axis calculated by the boundary points of a jersey number can
be used as the basis to represent the jersey number.
3.2
3.2.1
(xs
The Properties
Contour Descriptor
940
Experimental Results
number due to its extremely low resolution. Our method and
Li and Tans method also failed in the cases shown in Figs. 9
(e) and (f). In these cases, the jersey number detection process
failed to detect the complete shape of the numbers and thus
inuenced the subsequent results. Some of the false detections
of the comparing method [10] are referred to Fig. 10.
4.2
Quantitative Evaluation
Qualitative Evaluation
941
6 References
Fig. 10. False jersey number recognition results of [10]; the
quite different values of cross ratio calculaton cause the recognition results noisy. The red dots represent the rst and the
second cross points of the green line segment to the detected
contour. (a) the inner contour loss caused by foreground detection at low resolution, (b) the size of the detected inner contour
and the one of the original template quite different, (c) the
false inner contour detected due to the imperfect foreground
detection.
Detection rate
66.94 %
52.65 %
51.53 %
52.21 %
55.83 %
RR(our method)
83.89%
82.88%
86.51%
81.69%
83.74%
RR([10])
41.99%
47.28%
36.86%
76.06%
50.55%
T(our mthod)
0.0294
0.0305
0.0424
0.1905
0.0732
T([10])
73.50
72.66
75.03
75.61
74.20
the four testing videos all had serious twist and shadowing
effects. Therefore, the data set we faced was much complicated than the data set processed in [10] and [21]. However,
our method still received satisfactory results even under very
difcult conditions.
4.2.1 Complexity Comparison
The complexity analysis of our method and that of Li and
Tans method [10] are reported as follows. On average, in the
testing videos, the matching process of our method was about
0.07 seconds. Because testing Video-4 were all close-up shots
with high resolutions, they needed more time to calculate the
principal-axes. However, the matching process of Li and Tans
method [10] needed about 74 seconds. The detail comparison
results are illustrated in the last two columns of Table 1.
942