Who's Who in A Sports Video An Individual Level Sports Video Indexing System

2012 IEEE International Conference on Multimedia and Expo
WHOS WHO IN A SPORTS VIDEO?

AN INDIVIDUAL LEVEL SPORTS VIDEO INDEXING SYSTEM
Shih-Wei Suna,b , Wen-Huang Chengc,d , Yao-Ling Hungd , Ivy Fand ,
Chris Liue , Jacqueline Hungf , Chia-Kai Ling , and Hong-Yuan Mark Liaoc,d
a
Department of New Media Art, Taipei National University of the Arts, Taipei, Taiwan
Center for Art and Technology, Taipei National University of the Arts, Taipei, Taiwan
c
Research Center for Information Technology Innovation, Academia Sinica, Taipei, Taiwan
d
Institute of Information Science, Academia Sinica, Taipei, Taiwan
e
Department of Computer Science, University of Illinois-Urbana Champaign, IL
f
Electrical Engineering and Computer Science, Massachusetts Institute of Technology, MA
Electronic Engineering, National Taiwan University of Science and Technology, Taipei, Taiwan
b
ABSTRACT
Sports video analysis has attracted great attention in recent
years. In the past decade, numerous sports video indexing
approaches have been proposed at different semantic levels. In
this paper, an individual level sports video indexing (ILSVI)
scheme is proposed. The individual level refers to the indexing of a sports video on a player basis, i.e. to recognize each
player in a multi-player game. Since the jersey number is always worn by a player as the players identity in a game, it is
feasible to recognize jersey numbers for individual level indexing in sports videos. To solve the jersey number recognition
problem, a principal-axis based contour descriptor is proposed.
Compared to the state-of-the-art approaches, the proposed descriptor can achieve higher recognition rate and only consume
much less computation power. In addition, we developed an
interactive system to realize the individual level sports video
indexing (ILSVI). This interactive system includes a player
detection and a jersey number detection sub-systems. The interactive system can help complete the individual level sports
video indexing task. We shall use basketball game videos as
the basis to develop real-world systems.
Fig. 1. The whos who concept of our detecting and identifying individual players in a sports video, where the location
of a player, Bryant, is depicted by the green rectangle, and
the players related information (extracted from external information sources), such as his player statistics in the past two
weeks, is shown in the tables alongside with the video.
a new class of multimedia applications, such as the interactive video experience with specic players and the automatic
generation of player statistics [7].
To achieve the individual level indexing for a multi-player
sports game, Bertini et al. [8] proposed to recognize the face
as a players identity in soccer videos. However, a face-based
method usually needs the support of close-up shots with video
frames containing frontal faces. Lu et al. [7] developed a supervised learning scheme to distinguish players by tracking a
players blobs based on the conditional random elds (CRF),
but it takes more than 200 hours to manually label sufcient
training samples (9800 frames) for one single game. In addition to the faces and blobs taken from sports players, the jersey
number is a good choice for distinguishing the identity of a
player. Usually, the jersey number is worn by a player in a
multi-player sports game and it is denitely a great candidate
for individual level indexing. To recognize the jersey number
worn by a player is not an easy task since the player moves
all the time during the game play. In [9], Wang and Belongie
proposed a training based approach with HOG features to recognize the text characters in real scene photos. However, the
characters on a scene photo do not move and the performance
of their work relies on the training samples used. In [10], Li
Index Terms Sports video analysis, individual level

indexing, jersey number recognition
Introduction
Sports video analysis has attracted great attention in recent

years. Witnessing the huge amount of broadcasting sports
videos on the web, developing effective and efcient sports
video indexing tools is indispensable. In the past decade, numerous sports video indexing approaches have been proposed
at different semantic levels. To name a few, some approaches
were developed at the shot level [1], the event level [2], the
action level [3, 4], and the tactics level [5, 6], respectively.
However, few of the existing approaches intended to deal with
sports videos at the individual-player level which would enable
978-0-7695-4711-4/12 $26.00 2012 IEEE
DOI 10.1109/ICME.2012.59
937
and Tan proposed a cross ratio spectrum based scheme for real
scene charcater recognition. They adopted the dynamic time
warping scheme to perform matching. Their method is better
than the ones using SIFT [11] and Shape Context [12] in terms
of recognition performance. However, it is well known that the
dynamic time warping method is very time consuming and it
is not feasible to use it if one would like to deal with real-time
applications.
To solve the above mentioned jersey number recognition
problem, a principal-axis based contour descriptor is proposed.
Compared to the state-of-the-art approaches, such as [10], the
proposed descriptor can achieve higher recognition rate and
only consume much less computation power. In addition, we
developed an interactive system which included a player detection and a jersey number detection sub-systems to assist
the individual level sports video indexing task. In this paper,
we shall use basketball game videos as the basis to develop
related systems. Figure 1 is a conceptual illustration of the intended individual-player level indexing system. Once a player
is detected, the system will provide all related information of
this player (automatically extracted from external information
sources).
The rest of this paper is organized as follows. In Sec. 2,
the framework of the proposed ILSVI system and the kernel
techniques of the system components are presented. Next, the
jersey number recognition based on a principal axis contour
descriptor is presented in Sec. 3. The experimental results are
reported in Sec. 4. Finally, the conclusions and future work
are given in Sec. 5.
Fig. 2. The proposed framework of the individual level sports

video indexing (ILSVI) system.
2 Individual Level Sports

Video Indexing System
Fig. 3. Jersey color modeling: (a) an original frame for initializing the jersey color and the target player, (b) a standing
player detected by applying HOG [18], and (c) the extracted
jersey color based on the visual saliency used in [16].
Fig. 2 shows the framework of the proposed individual level

sports video indexing (ILSVI) system. Since the players in
the same team will have jersey numbers with the same color,
we rst utilize this color clue to detect players. Typically, the
jersey color of a home team is brighter, and the jersey color of
a visitor team is relatively darker. Therefore, the foreground
colors can be used to distinguish a home team from a visitor
team. Once the color model is constructed (will be detailed in
Sec. 2.1), there is no need to change (for efciency) until the
end of a game. Once the player area is determined (Sec. 2.2),
the proposed jersey number detection system is applied to
detect a players identity. As shown in the functional block
of the jersey number detection in Fig. 2, the two recognized
digits are individually marked by a blue and a pink rectangles.
In addition, as shown in the right column of Fig. 2, we
provide two different ways for the user to interact with the
proposed system. That is, users are allowed to use a mouse or
a Kinect [13] to select a specic player. Since our system is
linked with the NBA ofcial website [14], the chosen players
information can be searched and shown directly on the screen.
2.1
video frame (Fig. 3 (a)), we adopt the Histogram of Oriented

Gradient (HOG) to detect the players with the standing postures. Then, the omega shape feature [15] is utilized to detect
both the front-view and the back-view players (Fig. 3 (b)).
Compared to other players seen from side-views, the proportion of the visible jersey color areas of a front-view or backview is relatively larger and it provides more reliable sources
for extracting the correct jersey color. To detect the jersey color
area, we adopt the visual saliency detection method proposed
by Zhai and Shah in [16]. Fig. 3 (c) shows the extracted jersey
color based on Zhai and Shahs method [16]. Our jersey color
extraction method is superior to existing ones. For example,
the jersey color extraction process proposed in [17] can only
extract the jersey color of close-up players. Using our extraction process, a user can extract the jersey color of median-view
or court-view players (median-view and court-view are dened
in [6]).
Jersey Color Modeling
In the ILSVI system, a jersey color model is generated in the

very beginning as an initialization process. Given an initial
938
the proposed GMFD scheme (Fig. 5 (b)), a hole lling process

based on morphological reconstruction [19] can generate the
area of a player (Fig. 5 (c)). Next, the visual saliency map can
be calculated (Fig. 5 (d)). To remove the elements which do
not belong to a jersey number, we propose to apply orientation
ltering as the post processing step. According to the relation
of connected components, in this example, the text area of 1
and 5 are determined as the possible jersey number blobs.
The orientation vector of each blob can be calculated as vk ,
where k is the index of the blobs, as shown in Fig. 5 (e). The
blobs with vector values less than a threshold vth are rejected
from the candidate list, i.e.,
Fig. 4. The player area detection procedure: (a) an original

frame, (b) the Gaussian multi-layer frame difference (GMFD)
result, (c) white pixels are pixels with GMFD values smaller
than an adaptive threshold th calculated from the video frame,
and (d) the detected player area (yellow rectangle).
bc = {bk with |vk | vth },
where bc is the blob candidate. In addition, only the blobs with

large area will be considered as jersey numbers. Therefore, a
sorting process can help nd the most probable blob candidates
for jersey number detection. Finally, the jersey number blobs
are detected as shown by the underlined areas marked by the
green line and the red line in Fig. 5 (f).
Fig. 5. Number detection in a sports video: (a) an original

player area including upper and lower body, (b) player detection result based on the initialized jersey color model, (c) lled
holes in (b), (d) visual saliency calculated from (c) based on
[16], (e) orientation vectors calculated from the blobs; blue
vector: 8 , green vector: 70 , red vector: 67 , (f) the blobs
(underlined with corresponding colors) with larger orientation
degrees (nal result of the jersey number detection).
2.2
With the extracted jersey color as shown in Fig. 3 (c), we

calculate the mean j , and the standard deviation, j , of the
three color channels (i.e., RGB), respectively. Using the values
of j and j , one can decide a range of tolerance for a jersey
color. That is, instead of using a single value, we use a value
range to detect the jersey color areas as potential regions of
players. This treatment can help avoid the color variations
caused by shadowing and lighting changes. Given a pixel
(x, y) in a video frame, we propose a Gaussian multi-layer
frame difference (GMFD) scheme to compute the difference
among the color models in channel j, from layer j j to
layer j + j as follows:
dj (x, y) =
j +j
j j
gN (i)(Ij (x, y) i) di,
(1)
where gN represents a Gaussian lter which can be used to

enhance the inuence of the color intensity near the mean
value j . In addition, the overall GMFD can be obtained by
summing all color channels. Finally, the GMFD values smaller
than an adaptive threshold, th (calculated from the image), are
marked as jersey color pixels. An example of the player area
detection is shown in Fig. 4.
2.3
Principal Axis Based Controur Descriptor

for Jersey Number Recognition
Jersey number recognition plays the key role for player recognition in our ILSVI system. In this paper, a principal axis based
contour descriptor is proposed for jersey number recognition.
To perform text recognition in a real-world scene is a challenging problem. The method proposed by Li and Tan [10] can
recognize real scene characters that have serious perspective
deformation on sign boards. However, in sports videos, the
jersey numbers are not only severely distorted, they also suffer
from the shadowing and twist problems. The severe deformation problem makes the jersey number recognition process
very difcult. In addition, the time consumed in the jersey
number recognition process is another critical issue. Because
a player in a sports video often moves quickly, the viewable
jersey number in a video can only last for a few frames (the
results are reported in [7]). Thus, a frame-by-frame jersey
number recognition process is indispensable because it can
maximize the chance of getting correct jersey numbers. Since
the number of players in each frame can be multiple, a jersey
number recognition system that needs low computation cost is
desperately needed.
Once a jersey number is detected, one can convert it into a
binary image, as shown in Fig. 6 (a). The exterior boundary
points can be traced and marked from the binary image of a
jersey number. As shown in Fig. 6 (b), the white contour is
composed of the contour points of the exterior boundary. To
recognize the jersey numbers in a sports video, we propose
to calculate the principal axis rst and then use a contour
descriptor to globally describe a jersey number.
Player Detection
(2)
Jersey Number Detection Using Orientation Filtering
After a player area is detected (yellow rectangle shown in Fig.

2), again, the visual saliency [16] method is used to detect the
jersey numbers. A player detection example is shown in Fig.
5. Once the binary map of a jersey color is detected based on
3.1
Principal Axis Determination
To better understand our concept, the contour image in Fig. 6

(b) is counterclockwisely rotated 90 , as shown in Fig. 6 (c).
939
(xs , y s ) and (xc , y c ). These transformed points (xnk , ykn ) can

be expressed as follows:
yk y c
xk xc
,
),
c
s
c
s
x , y y ) (x xc , y s y c )
(4)
where (xs xc , y s y c ) is the normalization factor, and k
is the index of the contour points.
Because the starting point (xs , y s ) is determined, the corresponding normalized position (xn1 , y1n ) is also determined.
Suppose there are N points in total, the mid point (xm , y m )s
corresponding normalized position is (xnN , y nN ). Under these
2
2
circumstances, if N = 300, the normalized position of the
starting point is (xn1 , y1n ), and the normalized position of the
n
). An example is shown in Figs. 6
mid point is (xn150 , y150
(c) and (d). In this example, N = 300, (xn1 , y1n ), the starting point is marked by the green circle in Fig. 6 (c). On
n
), is marked by the
the other hand, the mid point, (xn150 , y150
blue circle. The contour points are sequentially traced clockn
wise from (xn1 , y1n ) (starting point) to (xn150 , y150
) (mid point),
with the order shown by the purple dashed curve in Fig. 6
(c). In this gure, the horizontal and vertical axes represent
the transformed normalized Cartesian coordinate in x and y
directions, respectively. One should notice that, the origin
xn +xn
y n +y n
is at (xc , y c ) = ( 1 2 150 , 1 2 150 ) (the yellow point). Next,
the points are sequentially depicted from xn1 to xn150 . The
magnitude is kept increasing from 1 to +1, as shown by the
blue curve between the point marked by the green circle and
that marked by the blue circle (Fig. 6 (d)). For the remaining
contour points, from xn150 to xn300 , the derivation is the same.
The contour points from xn150 to xn300 are shown in Fig. 6 (c).
They correspond to the bottom purple dashed curve in Fig.
6 (c). Similarly, the y components of the contour points are
depicted in the bottom part of Fig. 6 (d). As a result, the blue
curves shown in Fig. 6 (d) is utilized as the contour descriptor
of a jersey number.
(xnk , ykn ) = (
Fig. 6. The generation of a contour descriptor: (a) the binarized character, (b) the contour points, (c) the principal axis
(red line), starting point (xs , y s ) (green circle), mid point
(xm , y m ) (blue circle), and centroid point (xc , y c ) (yellow
dot), and (d) the contour descriptor (blue curves).
Fig. 7. The principal axis (red lines) and contour descriptor

(blue curves) comparison: (a) the digit 2, (b) the rotated digit
2. In the right hand side of (a) and (b), the horizontal axis is
the contour point order, and the vertical axis is the transformed
x coordinate (the upper sub-gure), and y coordinate (the
bottom sub-gure), respectively.
For representing the contour of a jersey number, we try to nd
the regression line, i.e., the principal axis, as follows:
y = 0 + 1 x.
(3)
The line equation shown in Eq. (3) is the line that has the
minimum total distance to all contour points. The intercept
parameter 0 and the slope parameter 1 are calculated based
on the conventional linear regression algorithm [20]. It is
known that the principal axis has the minimum total distance
to the contour points. Therefore, when a jersey number rotates, its corresponding principal axis (regression line) needs
to maintain the minimum distance characteristic. Fig. 7 (a)
shows a digit 2 and Fig. 7 (b) illustrates a rotated 2. It is
obvious that when a digit rotates, its corresponding principal
axis also rotates in the same manner. Therefore, the principal
axis calculated by the boundary points of a jersey number can
be used as the basis to represent the jersey number.
3.2
3.2.1
(xs
The Properties
From the contour descriptors shown at the right side of Figs.

7 (a) and (b), it is obvious that the contours of the original
version and the rotated version are similar if they are the same
digit. However, if the two digits are different, like the cases
in Fig. 6 (d) and Fig. 7 (a), their contours are totally different. This comparison reveals that the proposed principal axis
calculation and the contour descriptor is rotation invariant. In
addition, because the descriptor is calculated by a normalization factor (the distance between (xs , y s ) and (xc , y c )), the
proposed contour descriptor is also scaling invariant. Since
a detected jersey number is located according to the jersey
number detection process, the proposed contour descriptor has
the translation invariant property as well. In summary, the proposed contour descriptor is rotation, scaling, and translation
(RST) invariant.
In general, a human subject wont appear in a basketball
court with an upside down pose. Therefore, the ambiguity ex-
Contour Descriptor
In this section, we shall describe what a contour descriptor is.

According to the principal axis determination process, the red
line in Fig. 6 (c) is the determined principal axis of the jersey
number shown in Fig. 6 (a). In addition, the intersection points
of the principal axis (red line) and the contour (white curve) are
(xs , y s ) (starting point) and (xm , y m ) (mid point), as shown by
the green circle and blue circle in Fig. 6 (c), respectively. On
s
m y s +y m
the other hand, the centroid (xc , y c ) = ( x +x
, 2 ) of the
2
two points can be calculated (yellow point). Meanwhile, the
positions of the N contour points (xk , yk ) can be transformed
to another Cartesian coordinate system, using the centroid
(xc , y c ) as the origin. Under the circumstance, all contour
points are normalized by a factor which is the distance between
940
Fig. 8. Twist and Shadowing Problems: (a) original 8, (b)

twisted 8, and (c) shadow on 8. In the bottom of Figs.
(a)-(c), the horizontal axis is the contour point order, and the
vertical axis is the transformed x coordinate (the upper subgure), and y coordinate (the bottom sub-gure), respectively.
istent between the digits 6 and 9 wont be a problem if we
assume an upside down pose of a human subject is unacceptable in the basketball court. On the other hand, the recognition
process is based on comparing the difference among the number templates (0-9). The label with the minimum difference
dominates the nal recognition result. The examples shown in
Fig. 8 reveal that the twist and shadowing problems can only
inuence the local distribution (the curves in marked green
circles). However, the overall distribution derived by contour
descriptor is still similar to the original one. In these cases, the
twist and shadowing problems can be considered as the noises
generated by the proposed contour descriptor.
Fig. 9. The jersey number recognition results: (a) No. 8

belongs to the white team, (b) No. 17 belongs to the purple
team, (c) No. 32 belongs to the white team of another game,
(d) No. 9 belongs to the white team, (e) false recognition result
due to motion blur, causing false jersey number detection, and
(f) false recognition result detection due to false jersey number
detection (two numbers detected as one blob).
Experimental Results
number due to its extremely low resolution. Our method and
Li and Tans method also failed in the cases shown in Figs. 9
(e) and (f). In these cases, the jersey number detection process
failed to detect the complete shape of the numbers and thus
inuenced the subsequent results. Some of the false detections
of the comparing method [10] are referred to Fig. 10.
Based on the above discussions, it is known that the effective

recognition of jersey numbers is the key technologies for realizing the proposed ILSVI system. Therefore, in our current
experiments, we will rst focus on the evaluation of our algorithms for the jersey number recognition. The recognition
performances are compared with a state-of-the-art approach,
i.e. Li and Tans method [10]. Four testing videos (numbered
as Video-1 to Video-4) belonging to three different basketball
games in a full HD resolution, 1920 1080, were collected
from the NBA ofcial web site [14]. The ground truth of
all the recognizable jersey numbers in the four testing videos
were manually labeled for our evaluations. Particularly, Video1 to Video-3 are captured in the court-view, and Video-4 are
in the close-up view. The total number of the labeled recognizable jersey blobs is 4, 493, including 1, 201 in Video-1,
699 in Video-2, 2, 457 in Video-3, and 136 in Video-4, respectively. In addition, a demo video for better understanding the
concept of the proposed ILSVI system is available at http:
//www.youtube.com/watch?v=c_hAvprjRH0
4.1
4.2
Quantitative Evaluation
In the second column of Table 1, we show detection rate of our

jersey number detection. This rate is dened by the ratio of
the number of correctly detected jersey number blobs over the
total number of the ground truth blobs. The detection rate was
higher for the close-up view (e.g. Video-4) than the court-view
(e.g. Video-2), as expected.
We triggered the jersey number recognition process on
the correctly detected jersey number blobs, the average recognition rate of the proposed approach was 83.74% for all the
testing videos. For the results obtained by applying Li and
Tans method [10], the average recognition rate was 50.55%.
In other words, our method has a 33% improvement over Li
and Tans method in terms of the jersey number recognition accuracy. In the experiments conducted on Video-4, Li and Tans
method received compatible recognition rate in comparison
with our method. This is because the constituent shots of the
testing Video-4 were all close-up shots and these shots all had
high resolutions. For the remaining three testing videos, since
the resolution of their constituent shots was not high enough,
the performance difference became more signicant between
the two comparing methods. The jersey numbers recorded in
Qualitative Evaluation
Sample results of our jersey number recognition are given in

Fig. 9. The examples shown in Figs. 9 (a)-(d) demonstrate
that the proposed method could correctly recognize the jersey
numbers even if the size of a player was relatively small (the
jersey numbers were in a very low resolution as well). In Fig.
9 (d), our method correctly recognized the jersey number as
9 using the proposed contour features. However, the method
proposed by Li and Tan [10] could not correctly recognize the
941
tour descriptor. In the comparison results to the state-of-the-art

approaches, the proposed method provided higher recognition
rate with less computations. In addition, an interactive system
is developed to realize the individual level sports video indexing (ILSVI). The proposed interactive system is composed
of a player detection sub-system, a jersey number detection
sub-system, and a user interface to complete the individual
level sports video indexing task. We collected some basketball
game videos as the basis to verify that the proposed ILSVI
system can deal with real-world videos.
6 References
Fig. 10. False jersey number recognition results of [10]; the
quite different values of cross ratio calculaton cause the recognition results noisy. The red dots represent the rst and the
second cross points of the green line segment to the detected
contour. (a) the inner contour loss caused by foreground detection at low resolution, (b) the size of the detected inner contour
and the one of the original template quite different, (c) the
false inner contour detected due to the imperfect foreground
detection.
[1] L.Y. Duan et al., A unied framework for semantic shot

classication in sports video, IEEE TMM, 2005.
[2] Babaguchi et al., Event based indexing of broadcasted
sports video by intermodal collaboration, TMM, 2002.
[3] H. Li et al., Automatic detection and analysis of player
action in moving background sports video sequences,
IEEE TCSVT, 2010.
[4] B. Yao and F.F. Li, Modeling mutual context of object
and human pose in human-object interaction activities,
IEEE CVPR, 2010.
[5] G. Zhu et al., Event tactic analysis based on broadcast
sports video, IEEE TMM, 2009.
[6] M.C. Hu et al., Robust camera calibration and player
tracking in broadcast basketball video, TMM, 2011.
[7] W.L. Lu et al., Identifying players in broadcast sports
videos using conditional random elds, CVPR, 2011.
[8] M. Bertini et al., Automatic detection of players identity
in soccer videos using faces and text cues, MM, 2006.
[9] K. Wang and S. Belongie, Word spotting in the wild,
ECCV, 2010.
[10] L. Li and C.L. Tan, Recognizing planar symbols with
severe perspective deformation, IEEE TPAMI, 2010.
[11] D. Lowe, Distinctive image features from scale invariant
keypoints, IJCV, 2004.
[12] S. Belongie et al., Shape matching and object recognition using shape contexts, IEEE TPAMI, 2002.
[13] http://www.xbox.com/en-US/Kinect/Home/.
[14] http://www.nba.com/.
[15] M. Li et al., Rapid and robust human detection and
tracking based on omega-shape features, ICIP, 2009.
[16] Y. Zhai and Shah, Visual attention detection in video
sequences using spatiotemporal cues, ACM MM, 2006.
[17] Wang et al., Automatic extraction of semantic colors in
sports video, IEEE ICASSP, 2004.
[18] N. Dalal and B. Triggs, Histograms of oriented gradients for human detection, IEEE CVPR, 2005.
[19] P. Soille, Morphological image analysis: Principles and
applications, Springer-Verlag, 1999.
[20] W. Dumouchel and F. OBrien, Integrating a robust
option into a multiple regression computing environment,
CSS, 1989.
[21] L. Gorelick et al., Shape representation and classication using the poisson equation, IEEE TPAMI, 2006.
Table 1. Comparison results on 4, 493 jersey number blobs.

RR represents the success recognition rate of the correctly
detected jersey blobs; T represents the average execution time
of one digit recognition with the unit of seconds.
Video-1
Video-2
Video-3
Video-4
Avg.
Detection rate
66.94 %
52.65 %
51.53 %
52.21 %
55.83 %
RR(our method)
83.89%
82.88%
86.51%
81.69%
83.74%
RR([10])
41.99%
47.28%
36.86%
76.06%
50.55%
T(our mthod)
0.0294
0.0305
0.0424
0.1905
0.0732
T([10])
73.50
72.66
75.03
75.61
74.20
the four testing videos all had serious twist and shadowing
effects. Therefore, the data set we faced was much complicated than the data set processed in [10] and [21]. However,
our method still received satisfactory results even under very
difcult conditions.
4.2.1 Complexity Comparison
The complexity analysis of our method and that of Li and
Tans method [10] are reported as follows. On average, in the
testing videos, the matching process of our method was about
0.07 seconds. Because testing Video-4 were all close-up shots
with high resolutions, they needed more time to calculate the
principal-axes. However, the matching process of Li and Tans
method [10] needed about 74 seconds. The detail comparison
results are illustrated in the last two columns of Table 1.
Conclusions and Future Work
In this paper, we proposed an individual level sports video

indexing (ILSVI) scheme. The proposed system can automatically recognize each player in a multi-player game to achieve
the individual level indexing. We utilized the jersey numbers
worn by a player to represent the players identity in a game
for individual level indexing in sports videos. To achieve jersey
number recognition, we proposed a principal-axis based con-
942

Who's Who in A Sports Video An Individual Level Sports Video Indexing System

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Who's Who in A Sports Video An Individual Level Sports Video Indexing System

Hochgeladen von

Copyright:

Verfügbare Formate

2012 IEEE International Conference on Multimedia and Expo

WHOS WHO IN A SPORTS VIDEO?

Index Terms Sports video analysis, individual level

Sports video analysis has attracted great attention in recent

Fig. 2. The proposed framework of the individual level sports

2 Individual Level Sports

Fig. 2 shows the framework of the proposed individual level

video frame (Fig. 3 (a)), we adopt the Histogram of Oriented

Jersey Color Modeling

In the ILSVI system, a jersey color model is generated in the

the proposed GMFD scheme (Fig. 5 (b)), a hole lling process

Fig. 4. The player area detection procedure: (a) an original

bc = {bk with |vk | vth },

where bc is the blob candidate. In addition, only the blobs with

Fig. 5. Number detection in a sports video: (a) an original

With the extracted jersey color as shown in Fig. 3 (c), we

gN (i)(Ij (x, y) i) di,

where gN represents a Gaussian lter which can be used to

Principal Axis Based Controur Descriptor

Jersey Number Detection Using Orientation Filtering

After a player area is detected (yellow rectangle shown in Fig.

Principal Axis Determination

To better understand our concept, the contour image in Fig. 6

(xs , y s ) and (xc , y c ). These transformed points (xnk , ykn ) can

Fig. 7. The principal axis (red lines) and contour descriptor

From the contour descriptors shown at the right side of Figs.

In this section, we shall describe what a contour descriptor is.

Fig. 8. Twist and Shadowing Problems: (a) original 8, (b)

Fig. 9. The jersey number recognition results: (a) No. 8

Based on the above discussions, it is known that the effective

In the second column of Table 1, we show detection rate of our

Sample results of our jersey number recognition are given in

tour descriptor. In the comparison results to the state-of-the-art

[1] L.Y. Duan et al., A unied framework for semantic shot

Table 1. Comparison results on 4, 493 jersey number blobs.

Conclusions and Future Work

In this paper, we proposed an individual level sports video

Das könnte Ihnen auch gefallen