Beruflich Dokumente
Kultur Dokumente
2721
norm to prevent over-fitting. maximum commonness. This process ends until all training
Our sparse combination learning has two goals. The data are computed and bounded. The size of combination
first goal – effective representation – is to find K basis K reflects how informative the training data are.
combinations, which enjoy a small reconstruction error t. Specifically, in the ith pass, given the leftover training
It is coarsely expressed as data Xc ⊆ X that cannot be represented by previous
combinations {S1 , . . . , Si−1 }, we compute Si to bound
n
K
most data in Xc . Our objective function becomes
t min γji xj − Si β ij 22
S,γ , β j=1 i=1 min γji (xj − Si β ij 22 − λ)
(2) Si ,γ , β j∈Ω
K
c
2722
Algorithm 1 Training for Sparse Combination Learning Algorithm 2 Testing with Sparse Combinations
Input: X , current training features Xc = X Input: x, auxiliary matrices {R1 , . . . , RK } and thresh-
initialize S = ∅ and i = 1 old T
repeat for j = 1 → K do
repeat if Rk x22 < T then
Optimize {Si , β} with Eqs. (6) and (7) return normal event;
Optimize {γ} using Eq. (9) end if
until Eq. (5) converges end for
Add Si to set S return abnormal event;
Remove computed features xj with γji = 0 from Xc
i=i+1
until Xc = ∅ It is noted that the first a few dominating combinations
Output: S represent the largest number of normal event features,
which enable us to determine positive data quickly. In
our experiments, the average combination checking ratio
Algorithm Summary and Analysis In each pass, we is 0.325, which is the number of combinations checked
learn one Si . We repeat this process to obtain a few divided by the total number K. Also, our method can be
combinations until the training data set Xc is empty. This easily accelerated via parallel processing to achieve O(1)
scheme reduces information overlap between combinations. complexity although it is generally not necessary.
We summarize our training algorithm in Algorithm 1. The
initial dictionary Si in each pass is calculated by clustering 2.4. Relation to Subspace Clustering
training data Xc via K-means with s centers.
Our approach can be regarded as enhancement of sub-
Our algorithm is controlled by λ, the upper bound of
space clustering [7] with the major difference on the work-
reconstruction errors. Reducing it could lead to a larger
ing scheme. The relationship between subspace clustering
K. Our approach is expressive because all training normal
and our method is similar to that between K-means and
event patterns are represented with controllable reconstruc-
hierarchical clustering [18]. Specifically, subspace cluster-
tion errors under condition (3). We train 20K-sample data,
ing method [7] takes the number of clusters k as known
whose scales are in R100 , in 20 minutes on a PC with 8GB
or fixed beforehand, like K-means. In video abnormality
RAM and an Intel 3.4GHz CPU.
detection applications, it is however difficult to know the
2.3. Testing optimal number of bases in prior. Our approach utilizes the
allowed representation error to build combinations, where
With the learned sparse combinations S = {S1 . . . SK },
the error upper bound is explicitly implemented with clear
in the testing phase with new data x, we check if there exists
statistical meaning. There is no need to define the cluster
a combination in S fitting the reconstruction error upper
size in this method. Our extensive experiments manifest
bound. It can be quickly achieved by checking the least
that our strategy is both reliable and efficient.
square error for each Si :
min x − Si β i 22 ∀ i = 1, . . . , K (10) 3. Experiments
βi
We empirically demonstrate that our model is suitable
It is a standard quadratic function with the optimal solution to represent general surveillance videos. We apply our
i = (ST S )−1 ST x. method to different datasets. Quantitative comparisons are
β i i i (11) provided.
The reconstruction error in Si is 3.1. System Setting
i 2 = (S (ST S )−1 ST − I )x2 ,
x − Si β (12) In our method, size of Si ∈ Rp×s controls the sparsity
2 i i i i p 2
level. We experimentally set s = 0.1 × p where p is the data
where Ip is a p × p identity matrix. To further simplify dimension. λ in Eq. (4) is the error upper bound, set to 0.04
computation, we define an auxiliary matrix Ri for each Si : in experiments.
Ri = Si (STi Si )−1 STi − Ip . (13) Given the input video, we resize each frame to 3 scales
with 20×20, 30×40, and 120×160 pixels respectively and
Reconstruction error for Si is accordingly Ri x22 . If it is uniformly partition each layer to a set of non-overlapping
small, x is regarded as a normal event pattern. The final 10 × 10 patches, leading to 208 sub-regions for each frame
testing scheme is summarized in Algorithm 2. in total shown in Fig. 2. Corresponding sub-regions in 5
2723
Figure 3. A few frames from the surveillance videos used for Figure 5. Spatial distribution of combination numbers to represent
verification. normal structures in the Avenue data.
6000
Run Loiter Throw False Alarm
5000 Ground Truth 4 5 5 N/A
4000 Ours 4 4 4 1
3000 Table 1. Detection results in the Avenue dataset.
2000
2724
100
50
0
0 100 200 300 400 500 600
Figure 6. Detection results in a video sequence. The bottom plot is the response. A peak appears when an abnormal event – paper throwing
– happens. The x value indexes frames and y-index denotes response strength.
(a) events (b) maps (c) maps (a) events (b) maps (c) maps
Figure 7. Two abnormal events and their corresponding abnormal Figure 8. Subway dataset (Exit-Gate): Three abnormal events and
patches under two different scales in the Avenue dataset. their corresponding detection maps in two different scales in the
Subway-Exit video.
WD LT MISC Total FA
Ground Truth 9 3 7 19 0 WD NP LT II MISC Total FA
[22] 9 3 7 19 2 GT 26 13 14 4 9 66 0
[10] 9 3 7 19 3 [22] 25 9 14 4 8 60 5
[5] 9 - - - 2 [10] 24 8 13 4 8 57 6
subspace 6 3 5 14 4 [5] 21 6 - - - - 4
Ours 9 3 7 19 2 subspace 21 6 9 3 7 46 7
Ours 25 7 13 4 8 57 4
Table 2. Comparison with other sparsity-based methods [22, 5] on Table 3. Comparison using the Subway-Entrance video with sever-
the Exit-Gate Subway dataset. WD: wrong direction; LT: loitering; al previous methods. GT: ground truth; WD: wrong direction; NP:
FA: false alarm. “-” means the results are not provided. Subspace: no payment; LT: loitering; II: irregular interactions; MISC: misc;
results by replacing our combination learning by subspace cluster- FA: false alarm. “-” means results are not provided. Subspace:
ing [7]. replacing our combination learning by subspace clustering [7].
2725
1 Second/Frame Platform CPU Memory
[13] 25 - 3.0 GHz 2.0GB
0.8 [5] 3.8 - 2.6 GHz 2.0GB
Sparse [2] 5 ∼ 10 MATLAB - -
DMT
0.6 Ours 0.00697 MATLAB 2012 3.4 GHz 8.0GB
SF
TPR
MPPCA
0.4 MPPCA+SF Table 5. Running time comparison on the UCSD Ped1 dataset.
Adam
Antic
0.2 Saligrama
Subspace ROC Curve Comparison According to [13] in frame-
Ours
0
level detection, if a frame contains at least one abnormal
0 0.2 0.4 0.6 0.8 1 pixel, it is considered as successful detection. In our
FPR
experiment, if a frame contains one or more abnormal
Figure 9. Frame-level comparison of the ROC curves in UCSD patches, we label it as an abnormal event. For frame-level
Ped1 dataset. Method abbreviation: MPPCA+SF [13], SF [13], evaluation, we alter frame abnormality threshold to produce
MDT [13], Sparse [5], Saligrama [16], Antic [2], Subspace:
a ROC curve shown in Fig. 9. Our method has a reasonably
replacing our combination learning by subspace clustering [7].
high detection rate when the false positive value is low. It is
vital for practical detection system development.
Sparse
In pixel level evaluation, a pixel is labeled as abnormal,
1.2 DMT if and only if the regions it belongs to in all scales are
SF
1 MPPCA abnormal. We alter threshold for all pixels. Following [13],
MPPCA+SF
Adam
if more than 40% of truly anomalous pixels are detected,
0.8 Subspace the corresponding frame is considered as being correctly
TPR
Ours
0.6 detected. We show the ROC curve in Fig. 10. Besides
all methods that are compared in [13], we also include
0.4
the performance of subspace clustering [7]. Our method
0.2 achieves satisfactory performance.
0 EER and EDR Different parameters could affect detec-
0 0.2 0.4 0.6 0.8 1
FPR tion and error rates. Following [13], we obtain these rates
Figure 10. Pixel-level comparison of the ROC curves in UCSD when false positive number equals to the missing value.
Ped1 dataset. Method abbreviation: MPPCA+SF [13], SF [13], They are called equal error rate (EER) and equal detected
MDT [13], Sparse [5], Saligrama [16], Antic [2], Subspace: rate (EDR). We compute the area under the ROC curve
replacing our combinations learning by subspace clustering [7]. (AUC). We report EER, ERD and AUC in the pixel-level
comparison (Table 6) and calculate EER and AUC in the
frame-level (Table 7). These results indicate that our results
low false alarm. are with high quality in both measures.
We compare the running time in Table 5. The detection
Running Time Comparison We compare our system
time per frame and working platforms of [13, 5, 2] are
with other sparse dictionary learning based methods [22, 5]
obtained from the original papers.
in terms of running time on the Subway dataset in Table 4.
The speed of methods [22, 5] is reported in their respective 3.6. Separate Cost Analysis
papers. The difference on detection speed is much larger
than that of working environment. Our testing includes two main steps: feature extraction
(3D cube gradient computing and PCA) and combination
3.5. UCSD Ped1 Dataset testing using Algorithm 2. Other minor procedures are
frame resizing, matrix reshape, etc. We list the average
The UCSD Ped1 dataset [13] provides 34 short clips
running time spent for each step to process one frame in
for training, and another 36 clips for testing. All testing
the three datasets in Table 8.
clips have frame-level ground truth labels, and 10 clips have
pixel-level ground truth labels. There are 200 frames in
each clip.
4. Conclusion
Our configuration is similar to that of [13]. That is, the We have presented an abnormal event detection method
performance is evaluated on frame- and pixel-levels. We via sparse combination learning. This approach direct-
show the results via ROC curves, Equal Error Rate (EER), ly learns sparse combinations, which increase the testing
and Equal Detected Rate (EDR). speed hundreds of times without compromising effective-
2726
SF [13] MPPCA [13] SF-MPPCA [13] MDT [13] Sparse[5] Adam[1] Antic [2] Subspace [7] Ours
EDR 21 % 18 % 18 % 45 % 46 % 24 % 68 % 39.3% 59.1 %
AUC 19.7 % 20.5 % 21.3 % 44.1 % 13.3% 46.1 % 76 % 43.2 % 63.8 %
Table 6. Comparison of pixel-level EDR and AUC curves on the UCSD Ped1 dataset.
SF-MPPCA [13] SF [13] MDT [13] Sparse[5] Saligrama [16] Antic [2] Subspace [7] Ours
EER 40 % 31 % 25 % 19 % 16 % 18 % 29.6% 15 %
AUC 59 % 67.5 % 81.8% 86 % 92.7 % 91 % 68.4 % 91.8 %
Table 7. Comparison of frame-level EER and AUC curves on the UCSD Ped1 dataset.
Feature extraction (ms) Combinations testing (ms) Others (ms) All (ms) FPS
Avenue 4.513 1.792 0.770 7.075 141.34
UCSD Ped1 4.496 1.724 0.743 6.965 143.57
Subway 4.634 1.409 0.625 6.412 155.97
Table 8. Average running time of processing one frame in each step on the three datasets. “ms” is short for millisecond.
ness. Our method achieves state-of-the-art results in several [10] J. Kim and K. Grauman. Observe locally, infer globally:
datasets. It is related to but differ largely from traditional a space-time mrf for detecting abnormal activities with
subspace clustering. Our future work will be to extend incremental updates. In CVPR, pages 2921–2928, 2009.
the sparse combination learning framework to other video [11] L. Kratz and K. Nishino. Anomaly detection in extremely
applications. crowded scenes using spatio-temporal motion pattern mod-
els. In CVPR, pages 1446–1453, 2009.
[12] C. Lu, J. Shi, and J. Jia. Online robust dictionary learning.
Acknowledgments In CVPR, 2013.
This research has been supported by General Research [13] V. Mahadevan, W. Li, V. Bhalodia, and N. Vasconcelos.
Anomaly detection in crowded scenes. In CVPR, 2010.
Fund (No. 412911) from the Research Grants Council of
[14] J. Mairal, F. Bach, J. Ponce, and G. Sapiro. Online learning
Hong Kong.
for matrix factorization and sparse coding. The Journal of
Machine Learning Research, 11:19–60, 2010.
References [15] R. Mehran, A. Oyama, and M. Shah. Abnormal crowd
behavior detection using social force model. In CVPR, 2009.
[1] A. Adam, E. Rivlin, I. Shimshoni, and D. Reinitz. Ro-
bust real-time unusual event detection using multiple fixed- [16] V. Saligrama and Z. Chen. Video anomaly detection based
location monitors. IEEE TPAMI, 30(3):555–560, 2008. on local statistical aggregates. In CVPR, pages 2112–2119,
2012.
[2] B. Antic and B. Ommer. Video parsing for abnormality
[17] J. Shi, X. Ren, G. Dai, J. Wang, and Z. Zhang. A non-convex
detection. In ICCV, pages 2415–2422, 2011.
relaxation approach to sparse dictionary learning. In CVPR,
[3] Y. Benezeth, P.-M. Jodoin, V. Saligrama, and C. Rosenberg- pages 1809–1816, 2011.
er. Abnormal events detection based on spatio-temporal co-
[18] H. Trevor, T. Robert, and J. H. Friedman. The elements of
occurences. In CVPR, 2009.
statistical learning. Springer New York, 2001.
[4] D. Bertsekas. Nonlinear programming. Athena Scientific [19] X. Wang, X. Ma, and E. Grimson. Unsupervised activity
Belmont, MA, 1999. perception by hierarchical bayesian models. In CVPR, pages
[5] Y. Cong, J. Yuan, and J. Liu. Sparse reconstruction costs 1–8, 2007.
for abnormal event detection. In CVPR, pages 3449–3456, [20] S. Wu, B. E. Moore, and M. Shah. Chaotic invariants
2011. of lagrangian particle trajectories for anomaly detection in
[6] X. Cui, Q. Liu, M. Gao, and D. Metaxas. Abnormal detection crowded scenes. In CVPR, 2010.
using interaction energy potentials. In CVPR, pages 3161– [21] D. Zhang, D. Gatica-Perez, S. Bengio, and I. McCowan.
3167, 2011. Semi-supervised adapted hmms for unusual event detection.
[7] E. Ehsan and R. Vidal. Sparse subspace clustering. In CVPR, In CVPR, 2005.
2009. [22] B. Zhao, L. Fei-Fei, and E. Xing. Online detection of unusual
[8] F. Jianga, J. Yuan, S. A. Tsaftarisa, and A. K. Kat- events in videos via dynamic sparse coding. In CVPR, 2011.
saggelosa. Anomalous video event detection using spa- [23] H. Zhong, J. Shi, and M. Visontai. Detecting unusual activity
tiotemporal context. Computer Vision and Image Under- in video. In CVPR, 2004.
standing, 115(3):323–333, 2011.
[9] K. Jouseok and L. Kyoungmu. A unified framework for event
summarization and rare event detection. In CVPR, 2012.
2727