Sie sind auf Seite 1von 8

Pattern Recognition 45 (2012) 36033610

Contents lists available at SciVerse ScienceDirect

Pattern Recognition
journal homepage: www.elsevier.com/locate/pr

A cascade fusion scheme for gait and cumulative foot pressure


image recognition
Shuai Zheng a, Kaiqi Huang a,n, Tieniu Tan a, Dacheng Tao b
a
b

National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China
Centre for Quantum Computation & Intelligent Systems, Faculty of Engineering and Information Technology, University of Technology, Sydney, NSW 2007, Australia

a r t i c l e i n f o

abstract

Article history:
Received 19 November 2010
Received in revised form
28 February 2012
Accepted 10 March 2012
Available online 5 April 2012

Cumulative foot pressure images represent the 2D ground reaction force during one gait cycle.
Biomedical and forensic studies show that humans can be distinguished by unique limb movement
patterns and ground reaction force. Considering continuous gait pose images and corresponding
cumulative foot pressure images, this paper presents a cascade fusion scheme to represent the potential
connections between them and proposes a two-modality fusion based recognition system. The
proposed scheme contains two stages: (1) given cumulative foot pressure images, canonical correlation
analysis is employed to retrieve corresponding gait pose image candidates in gallery dataset;
(2) pedestrian recognition is achieved via small samples matching between retrieved gait pose images
and unlabeled ones. The proposed fusion recognition system is not only insensitive to slight changes of
environment and the individual users, but also can be extended to multiple biometrics retrieval.
Experimental results are conducted on the CASIA gaitfootprint dataset, which contains cumulative
foot pressure images and its corresponding gait pose image sequence from 88 subjects. Evaluation
results suggest the effectiveness of the proposed scheme compared to other related approaches.
& 2012 Elsevier Ltd. All rights reserved.

Keywords:
Gait
Foot pressure image
Human recognition

1. Introduction
Recent biomedical and forensic studies [1] reveal that humans
can be distinguished by unique walking patterns (e.g., limb
movement pattern and ground reaction force). Unique limb
movement patterns seem to help to recognize individuals at
distance (e.g., gait recognition [2]), while ground reaction force
and its variants such as footprints are also separately utilized to
identify criminals by human experts [3]. Recent walking pattern
based recognition systems are not practical for many reasons due
to the single camera sensor. For example, viewpoint problems in
gait recognition under constrains could be avoided if we use
Kinect [4]. The cumulative foot pressure image is a cumulative
image-type record of ground reaction force change during one
gait cycle [5]. It provides richer information to identify different
walking patterns, compared to the existing 1D ground reaction
force or simple 2D footprint pictures. The cumulative foot
pressure image has been applied in biomedical assistant, forensic
investigation, sports assistant training and custom shoes [5]. In
order to address the problems in existing walking pattern

Corresponding author. Tel.: 86 10 82610278; fax: 86 10 62551993.


E-mail addresses: szheng@nlpr.ia.ac.cn, kylezheng04@gmail.com (S. Zheng),
kqhuang@nlpr.ia.ac.cn (K. Huang), tnt@nlpr.ia.ac.cn (T. Tan),
dacheng.tao@uts.edu.au (D. Tao).
0031-3203/$ - see front matter & 2012 Elsevier Ltd. All rights reserved.
http://dx.doi.org/10.1016/j.patcog.2012.03.008

recognition systems, we propose to develop a computational


correlation model for cumulative foot pressure images and gait.
As far as we know, there are a few previous works proposed to
develop the computational correlation model for cumulative foot
pressure image and gait, although a lot of works have been
proposed to study gait recognition [68] or the cumulative foot
pressure images [5]. Multimodal biometric system [1] have been
proposed to combine evidences from different sources. These
sources might simultaneously come from various sensors [9],
different classication algorithms [10], multiple instances for the
evidence or directly from diverse biometric traits [11]. Naive
feature combinations may not always improve the performance,
since some components in different sources may not be complementary. Zhang et al. found that the performance of human
recognition using multiple sources could be improved by reducing the redundant classes in the gallery dataset [12]. Specically,
in order to evaluate the computational correlation model we
obtain, we also develop a human recognition system using gait
pose images and corresponding cumulative foot pressure images
without these limitations using a cascade fusion scheme.
The proposed study is necessary since it not only provides a
computational correlation model but also a solution for entrance
control applications. For example, in jailhouse security system or
suspect identication, cumulative foot pressure images and gait
pose images of the same individual can be captured at different
times. Hence, cumulative foot pressure images and gait pose

3604

S. Zheng et al. / Pattern Recognition 45 (2012) 36033610

GPI matching

Identification
Results

Camera
Gait Pose
Images
(GPIs)

GPIs
Higher
Similarity Score

Foot pressure measurement plate

Retrieval Result
Corresponding GPIs retrieval using CCA

Labeled GPIs Dataset

Foot pressure measurement device

Cumulative Foot
Pressure Images
(CFPIs)
Fig. 1. Overview of pedestrian recognition using gait and cumulative foot pressure images.

images can be used to identify escaped criminal or investigate


suspects noninvasively. Furthermore, the proposed cascade twomodality fusion scheme can also be employed to develop a crossmultiple-biometrics information retrieval system [13], which
could allow users to retrieve the gait pose images from the person
of interest given the corresponding cumulative foot pressure
images.
Fig. 1 gives an overview of the propose scheme. When a person
walks over the foot pressure measurement plate, cumulative foot
pressure images are captured. Simultaneously or not, corresponding gait pose image sequences are captured via an off-the-shelf
camera. After preprocessing and feature extraction, canonical
correlation analysis (CCA) is employed to nd corresponding
images in the gait pose gallery dataset, given the cumulative foot
pressure images. Then, pedestrian recognition is achieved via
matching the unlabeled gait pose images with labeled retrieved
ones. Briey, the scheme consists of three parts:

 Feature representation for gait pose images and cumulative


foot pressure images.

 Corresponding labeled gait pose image retrieval using CCA.


 Pedestrian recognition via gait pose image matching.
The remainder of this paper is organized as follows. Section 2
presents related work about gait, cumulative foot pressure and
multiple evidence fusion schemes for pedestrian recognition.
Section 3 illustrates the proposed cascade fusion scheme. In
Section 4, feature representation for gait pose images and cumulative foot pressure images are presented. Section 5 describes the
fusion based recognition system. Section 6 introduces the dataset.
Section 7 reports the experiments. We draw conclusion in Section 8.

2. Related works
Gait recognition is potentially useful for personal identication
[14]. It is quite attractive for identication purposes since its
advantages are that it is completely unobtrusive, and does not
involve any subject cooperation or contact. The state-of-the-art
approaches in gait recognition can be divided into model-based
and model-free approaches.

Model-based approaches tend to recover the underlying


mathematical construction of gait with a structure motion model.
The mean shapes of gait silhouettes are modeled by Wang et al.
via employing procrustes analysis [15]. Bouchrika and Nixon
extract crucial feature descriptions from human joints by developing a motion-based model using elliptic Fourier descriptors
[16]. However, the performance of the approaches suffers from
poor localization of the torso and difcult extraction of underlying models from gait sequences.
The other kind of approach is model-free one. One kind of
model-free approach preserves temporal information in recognition and training states. Hidden Markov models (HMMs) are
utilized to achieve gait recognition [17]. Principal component
analysis (PCA) [18,19] is employed to extract statistical spatial
temporal feature descriptors of gait [20]. In this kind of approach,
large-scale training samples are required for probabilistic temporal modeling approaches to obtain a good performance. Hence,
the disadvantage for the approach is the high computational
complexity of sequence matching during recognition and the
high storage requirement of the dataset. Another kind of modelfree approach converts a sequence of images into a single
template. Gait recognition by averaging all the silhouettes is
presented by Liu et al. [21]. Han proposed a gait energy image
(GEI) to construct real and synthetic gait templates [22]. The
recognition performance may degrade since the temporal information in gait sequences are discarded. Wang et al. developed a
spatial-temporal walking template called chrono-gait image (CGI)
to encode the temporal information via color mapping to improve
the recognition rates [23]. The main drawback of these
approaches is that they easily suffer from slight changes of
environment such as illumination variation in probe and gallery
data collections or crowded scenarios. Besides, traditional
motion-based gait representation is not practical and stable in
more wide application scenarios such as internet videos or image
sequences from IP camera.
Recent works on action recognition started to introduce some
feature descriptors like histograms of oriented gradients (HOG) to
represent several action key poses [24]. Such kinds of feature
descriptors have also helped to achieve state-of-the-art performance in object detection and object recognition [25]. In these
tasks, they are proved to overcome environmental challenges and
be able to represent objects without background priority or

S. Zheng et al. / Pattern Recognition 45 (2012) 36033610

motion-based segmentation. Inspired by their success, we propose to use combination feature descriptors obtained from
several continuous gait pose images to represent individual gait.
Footprints are an important identication evidence for forensic
investigation purposes. Although it has been applied since ancient
times, only recently Kennedy rst proved the uniqueness of bare
footprints and their use as a possible means of identication [26].
Previous works can be divided into two types, one is ground
reaction force and another is a still footprint image. Ground
reaction force is a 1D data signal record of walking. Moustakidis
proposed a subject recognition system based on ground reaction
force measurement [27]. Cattin developed a general PCA fusion
based biometric system using both gait and ground reaction
force [7]. However, this method is neither robust to slight noise
nor able to distinguish different walking manners between
different pedestrians. Besides, the strictly controlled data collection environment limits its potential applications. The second
method is utilizing still footprint image to achieve pedestrian
recognition. Nakajima proposed person recognition using normalized pairs of raw barefoot prints [28]. Uhl developed a footprintbased biometrics verication system using scanned barefoot
images [29]. The method failed in recognizing the same individual
when users wore different shoes. Cumulative foot pressure
images contain cumulative spatial and temporal force information during one gait cycle, which may help to handle the
difculties in recognizing individuals wearing different shoes.
Previous work shows that feature descriptors based on hierarchical models are invariant to different shoes [30]. Inspired by the
success of hierarchical models on translation-invariant object
recognition datasets, we propose a Gaussian mixture model
(GMM) based on cumulative foot pressure images.
Existing multimodal biometric systems are proposed by combining evidence from different sources [7,31]. Different combination approaches can be divided into three types: the feature level,
matching score level and decision level. Cattin utilized Bayesian
decision theory to fuse ground reaction force and gait [7]. This
loses much correlation information. Zhang et al. proposed to fuse
evidence at the feature level [31]. However, this is not practical
since the multiple modalities may be incompatible and direct
concatenating feature vectors may suffer from the curse of
dimensionality problem. He et al. investigated the performance
of various score level fusion approaches in multimodal biometric
systems [32]. But these approaches do not address the issue of
fusing evidence obtained at different times. Motivated by multimodal document cross retrieval systems [13], we propose a

Original Image Gradients

Probability of
boundary (pb)

HOG

3605

cascade fusion scheme to fuse gait and cumulative foot pressure


images both at the feature level and score matching level.

3. Feature representation
In this section, we attempt to achieve gait representation from
still image sequences and a cumulative foot pressure image
representation which is translation invariant.
3.1. Gait representation
A very basic assumption in all gait recognition research is that
all walking sequences from the same person follow a similar
walking pattern, where the walking pattern involves the moving
range of the limbs. However, there are many walking cycles
repeated in one walking image sequence. To estimate the walking
period and separate a single walking cycle from one walking
image sequence, we compute the movement of the limbs as the
frame changes. We rstly employ Felzenszwalbs detector [33] to
extract the bounding box for the pedestrian in each frame. Then
we compute the width change of bounding box as the frame
changes. This is similar to the method of Sarkar et al. [21].
We compute the gait representation as the concatenated probability of boundary based histogram of oriented gradients (pbHOG)
descriptors based on the normalized cropped walking poses for one
walking cycle. We extract the HOG descriptors based on the probability of boundary (Pb) operator [34] responses. As Fig. 2 illustrates,
the Pb operator suppresses small noise in the image. Hence, pbHOG
captures more salient walking pose details.
3.2. Translation-invariant representation for cumulative foot
pressure image
Our goal is to learn a feature representation model for cumulative foot pressure images which preserves translation-invariance.
Recent work [35] has been proposed to address the shoe-invariant
cumulative foot pressure image recognition. In contrast, we
address the problem of slight changes in barefoot pressure images.
Current low-level descriptors are proved to be invariant to minor
translation and effective in many object recognition applications.
However, they are not practical for representing cumulative foot
pressure image since they lose a lot of structure information (in
terms of object parts) which is crucial for shoes-invariant cumulative foot pressure image recognition. Previous works [12] prove

Original gait pose images

pbHOG
pbHOG for gait pose images

Gait pose image


representations

Fig. 2. Comparison of different gait representations and overview of concatenated pbHOG gait pose image representation.

3606

S. Zheng et al. / Pattern Recognition 45 (2012) 36033610

Normalize
CBPIs

Dense SIFT feature


extraction

GMM parameters
representation results

EM algorithm based
GMM modeling

Concatenated CBPI
descriptor vectors
COP curve
extraction

GMM parameters
representation results

EM algorithm based
GMM modeling

Center of Pressure
(COP) curve
Fig. 3. Diagram of GMM based representation for cumulative foot pressure images.

that hierarchical model will bring advantages of translation-invariant power. Hence, we propose to model cumulative foot pressure
image with hierarchical Gaussian mixture models, as Fig. 3
illustrates.
Suppose z denotes a p-dimensional feature vector (SIFT
descriptor) or COP (center of pressure curve) [27,5] from cumulative foot pressure image I. We model z by GMM model:
pz9y

M
X

wIk Nz; mIk , SIk ,

k1

where M denotes the total number of Gaussian components and


wIk , mIk , SIk are the weight, mean and covariance matrix of the kth
Gaussian component, respectively. For computational efciency,
the covariance matrices SIk are restricted to be a diagonal matrix.
The number of model parameter y wIk , mIk , SIk increases with
respect to N. N is the number of images. py is the distribution of
the parameters. Following [36], the prior mean vector, prior
weights, and covariance matrix are estimated by tting a global
GMM. The other parameters are learned by maximum a posteriori
(MAP) loss:
maxln pz9y ln py:
Center of pressure histograms in the cumulative foot pressure
image and dense SIFT feature descriptors are separately extracted
and modeled with GMMs. We achieved a CFPI representation via
concatenating these descriptor vectors. We represent the cumulative foot pressure image with the parameters of the GMM.
Hence, following the suggestion in [36], we represent cumulative
foot pressure image as
q
q
1=2
1=2
x wI1 S1 mI1 , . . . , wIM SM mIM :

object function:
wT C w

r ru u
,
r max p
T
T

wr C rr wr wu C uu wu

wr ,wu

where Crr and Cuu are within-sets covariance matrices and


C ru C Tur are between-sets covariance matrices. A closed form
solution can be computed by solving a generalized eigen-value
problem. Large problems can be solved efciently using predictive
low-rank decomposition with partial GramSchmidt orthogonalization [38].
Based on the learned projection matrices, we achieve the low
embed feature vectors X x1 ,x2 , . . . ,xn wTr r 1 , . . . ,wTr r n and
Y y1 ,y2 , . . . ,yn wTu u1 , . . . ,wTu un .
4.2. Correlated scores computation via CCA regression
In order to evaluate the correlation scores between the data
from two evidences, CCA regression is adopted again. Given probe
feature vectors X : x1 , . . . ,xn and gallery feature vectors
^ x,W
^ y is learned via CCA regresY : y1 , . . . ,yn, a mapping pair W
sion:
T

^ XY T W
^ y
EW
x
arg max q
:
T
^ y
^ x ,W
T
W
^ T YY T W
^ XX W
^ x EW
^ y
EW
x

Based on the mapping pair, we can compute projections as


follows:
T

^ x,
x0 W
x
T

^ y:
y0 W
y
Correlated scores sx0i ,y0i are computed between each gallery
and probe via CCA regression:
x0 y0i
:
Jx0 JJy0i J

4. Cascade fusion scheme

sx0 ,y0i

In this section, we present a cascade fusion scheme using


canonical correlation analysis and its variant regression algorithm.

Based on the scores, gallery samples that have high correlated


scores are retrieved as candidates for further pedestrian recognition.

4.1. Feature selection using canonical correlation analysis

5. Cascade fusion scheme for two-modality based pedestrian


recognition

In order to identify shared correlated structure among variables from two evidence sources, Canonical correlation analysis
(CCA) [37] is employed as a feature selection method rstly. The
algorithm estimates two basis vectors so that, after linear projection, the correlation between the two classes is mutually maximized. Given two sets of samples vectors S r 1 ,u1 ,r n ,un , and
their projection matrices wr and wu, CCA mutually maximizes the

Based on the proposed cascade fusion scheme, we achieve


pedestrian recognition using gait and cumulative foot pressure
images. As Fig. 4 illustrates, there are two inputs of the system:
cumulative foot pressure images and corresponding continuous
gait pose images. The two evidence sources can be simultaneous
or not, but need to be collected from the same individual during

S. Zheng et al. / Pattern Recognition 45 (2012) 36033610

3607

Labeled gait pose images (GPIs)

CCA-based
dimensionality reduction
Correlated Projection
matrix

Correlated
GPI
embedding

Unlabeled
Cumulative foot pressure
images (CFPIs)

Cosine distance
Measurement
Between embedding
and each labeled GPIs

Higher
Similarity Score

Retrieval Result

Fig. 4. Diagram of cascade fusion scheme for pedestrian recognition using gait and cumulative foot pressure images.

labeled dataset construction. Following Section 3, the concatenated HOG descriptors are proposed to represent a given gait
while a super-vector using GMM parameters is employed to
preserve the characteristics of given cumulative foot pressure
images. In the cascade fusion scheme, cumulative foot pressure
images are rstly employed to prune irrelevant parts of the search
space in the labeled gait dataset and predict the correlation
similarity scores between the given cumulative foot pressure
images and labeled gait image sequences. Then, the pedestrian
recognition process is achieved by matching the given gait image
sequences with a small number of labeled ones, using a trained
linear SVM classier.

6. Dataset
As far as we know, this dataset is the rst publicly accessible
dataset containing both cumulative foot pressure images and
corresponding gait together. This dataset consists of 3496 gait
pose images and 2658 cumulative foot pressure images. The
distributions of subjects in some basic attributes are presented
in Fig. 5. The data were collected from 88 pedestrians, 20 female
and 68 male, in an indoor environment. Experimental factors like
illumination, background and clothes are not under strict control,
as Fig. 6 illustrates.

7. Experiments
The purpose of the proposed system is to develop a computational evaluation framework for studying the relations between
gait and cumulative foot pressure images. Although the applications of the study presented in this paper are not limited to
recognition, we evaluate the performance of the proposed
approach following recognition evaluation criteria. To evaluate
the effectiveness of proposed cascade framework in recognition,

Fig. 5. The distribution of age and body measurement index (BMI) in GPFP
dataset. Most of subjects in GPFP dataset are youth, while most of them are
healthy or little fat.

we choose three other comparison approaches including a single


source approach (only gait feature descriptor), naive baseline
fusion approach, and CCA based fusion approach [10].
In experiments, we randomly divided the dataset into two
subsets: a training set of 440 groups of data containing gait pose
images and correlated cumulative foot pressure images and the
testing set containing the other 434 groups of data.

3608

S. Zheng et al. / Pattern Recognition 45 (2012) 36033610

Fig. 6. The sample of dataset containing gait pose and cumulative foot pressure images.

Best precision at recall level 16.67%.

Best precision at recall level 10.67%.


50

HOG
pbHOG

40

Precision %

Precision %

50

30
20

HOG
pbHOG

40
30
20
10

10

0
PCA

PLS

PCA+CCA

PCA

PLS

PCA+CCA

Fig. 7. Precision of GPIs pruning using CFPIs images as queries, when recall value is 10.67%. The experimental results show that PCA CCA feature selection method with
pbHOG GPIs representation outperforms others.

We evaluate the recognition performance using accuracy. To get


the best results, we compare the different potential features and
feature selection approaches considering the performance of retrieval, which often uses precision and recall as evaluation criteria.
Suppose the number of retrieved gait pose images is Rg, and the
number of the retrieved gait pose images which is correlated with
given cumulative foot pressure is Rcg. The precision is dened as
precision Rcg =Rg :
Recall is the fraction of the gait pose images which are correlated to
the correct retrieved ones. Assume that the number of correlated gait
pose images is Rc. Then, the recall is dened as
recall Rcg =Rc :
7.1. Feature and feature selection
To choose the features and feature selection approach, we
perform a retrieval evaluation. During the training stage, we learn
the feature subspace projection matrix. During testing, we utilize
the learned projection matrix to reduce the dimensionality of the

pbHOG and GMM feature vectors. Then our proposed CCA-based


cascade scheme is employed to assign correlation scores between
gaits pose images in dataset and cumulative foot pressure image
queries.
As Fig. 7 illustrates, the pbHOG feature descriptor and PCA
CCA feature selection are the best combination. Considering the
comparison of different feature descriptors, pbHOG performs best
since the probability of boundary largely reduces noise response
in still images which helps to reduce the within class divergence.
Considering different feature selection approaches, PCA and CCA
combination performs best because of the natural advantages of
PCA and CCA. PCA captures the principle components and ensures
that the feature vectors from the two evidence sources are not illposed, while CCA maximizes the correlations between cumulative
foot pressure images and gait pose for the same person.
The precisionrecall curve is presented only at the recall level
at 10.67% because it is difcult to visualize the precisionrecall
curve perfectly since we only have 88 subjects. Hence, rather than
giving the whole precisionrecall curve, we give comparison
results of precision values under two different recalls. Recall is
the fraction of relevant instances that are retrieved. The precision

S. Zheng et al. / Pattern Recognition 45 (2012) 36033610

drops signicantly when the recall increases. The corresponding


precision value is too small when recall is larger than 10.67%. In
this experiment, we nd that the system achieves best performance when we pick up the recall at 10.67%.
7.2. Fusion scheme for pedestrian recognition
To evaluate the performance of the proposed fusion approach,
we perform a recognition performance evaluation. During the
training stage, feature selection projection matrices and linear
SVM classier are trained on the gallery dataset. During the
testing stage, there are two inputs: one is cumulative foot
pressure images, another is gait pose images. Given probe data
in the form of cumulative foot pressure images and gait pose
images, the proposed approach will feed back the corresponding
label according to the matching process between the probe and
gallery data. Specically, the process in our proposed cascade
scheme for pedestrian recognition is presented as follows: First,
unlabeled cumulative foot pressure images are used as a probe
data to retrieve correlated labeled gait pose images in gallery
dataset. Then, the correlation score helps to prune much irrelevant labeled gait pose images and obtain a small number of
labeled gallery dataset. Second, unlabeled gait pose images are
sent to the classier via matching with the small number of
labeled gallery dataset.
In our experiments, the penalty parameter C in the linear SVMs
are all set as 10. In the cascade scheme, the threshold is set to 10,
which means we choose only the top 10 correlated scores of
labeled gait pose images for the gait matching process. We choose
this threshold because the recognition performance of proposed
system achieves the best results under this setting.
In Fig. 8, we choose the result of PCA feature selection based
gait recognition to compare the fusion scheme with human
recognition using only single source.
In the other approaches illustrated in Fig. 8, we compare
different fusion schemes while xing the feature representation
and feature selection approach. For the feature representation and
feature selection approach, we use pbHOG for gait representation
and the GMM representation for cumulative foot pressure images
while we use PCACCA as feature selection. Naive fusion means
combining the dimension reduced feature vectors from gait and

Accuracy %

Comparison of Different Pedestrian Identification Methods


100
90
80
70
60
50
40
30
20
10
0

99
86.18
67.97
39.17

3609

cumulative foot pressure image directly together. The benchmark


CCA based fusion method is the same as illustrated in [10].
Cascade fusion is the method developed in this paper.
Fig. 8 shows the performance result of different pedestrian
recognition schemes. The correlated-model-based cascade twomodality fusion scheme outperforms the other simple concatenated ones. This is not a surprising result. The correlated model
helps to reduce the number of labeled GPIs in the gallery dataset
and prune the irrelevant GPIs. Hence, it casts the original difcult
multi-class matching problem into a small-class or even binary
class matching problem. Further, the proposed cascade method
outperforms the concatenated-based fusion scheme. This is
because the proposed method utilizes the correlation between
the two modal data in correlation common space, while the
concatenated-based ones ignore this information.

8. Conclusions and future work


This paper reveals a new problem on how to reveal the
walking behavior pattern and demonstrate a possible way to
exploit the correlation among foot pressure, footprint and walking motion patterns for recognition. Specically, we presented a
work that allows to use different walking patterns to recognize
pedestrian without camera setting. The gait pose and cumulative
foot pressure are two types of correlated person walking behavior
patterns. In order to study the potential connections between the
two walking behavior patterns, we establish a standard database,
develop a pedestrian recognition system with the cascade fusion
scheme. The cascade scheme is proved to be effective to extract
the correlated parts between the two modalities, according to the
evaluation on the pedestrian recognition performance.
The proposed cascade fusion scheme could be used in designing a cross-biometric searching system which allows to retrieve
pedestrian or any biometric pattern with other query biometric
patterns. There are also some applications of interest without
camera settings, especially for criminal investigation and other
related specic applications. For example, it will be more convenient to prune the suspect criminal or others via comparing
among footprint, cumulative foot pressure images and gait pose
images.
The drawback of this paper is to assume that all subjects are
not wearing shoes, which is not very convenient for normal use.
Wearing different shoes would generate different footprint
shapes which might cause the recognition failure. In this case,
future work should be conducted to address the deformable
invariant cumulative foot pressure image recognition. Further,
the future work will also focus on studying the correlation
between footprint and cumulative foot pressure images, so that
the computational model can help forensic investigation, which
often collects a lot of footprint data from a crime scene.

Acknowledgment
Cascade fusion with PCA-CCA feature selection
Benchmark CCA based fusion with PCA-CCA as feature selection
Nave (combination) fusion with PCA-CCA as feature selection
Single gait recognition with PCA feature selection

Fig. 8. Comparison on different recognition systems: single gait recognition,


different fusion schemes. Single gait recognition with PCA feature selection
represents the human recognition with single evidence rather than two fusion
evidences. The gait feature presentation is pbHOG. The fusion schemes presented
here adopt all same input feature vector. The x input are pbHOG gait representation and GMM cumulative foot pressure image representation. PCACCA combination is adopted as supervised feature selection method. Naive fusion scheme
combines the two feature vectors together. Benchmark CCA based one use the
method in [10]. Cascade fusion one is the proposed one.

This work is supported by National Basic Research Program of


China (2012CB316302) and National Science Foundation of China
(Nos. 60875021 and 61175007). It also in part supported by ARC
discovery project with Project No. DP-120103730. We also
appreciate the help and suggestions of Dr. Tianyi Zhou, Prof.
Liming Shi, Prof. Liang Wang, and Dr. Ran He.
References
[1] L. Hong, A.K. Jain, S. Pankanti, Can multiplebiometrics improve performance,
in: IEEE Workshop on Automatic Identication Advanced Technologies, 1999,
pp. 18.

3610

S. Zheng et al. / Pattern Recognition 45 (2012) 36033610

[2] D. Tao, X. Li, X. Wu, S.J. Maybank, General tensor discriminant analysis and
gabor features for gait recognition, IEEE Transactions on Pattern Analysis and
Machine Intelligence 29 (10) (2007) 17001715.
[3] R.K. Lasrsen, E.B. Simonsen, N. Lynnerup, Gait analysis in forensic
medicineart. no. 64910m, Videometrics IX (2007) 6491.
[4] S. Sivapalan, D. Chen, S. Denman, S. Sridharan, and C. Fookes, Gait energy
volumes and frontal gait recognition using depth images, in: IEEE International Joint Conference on Biometrics, 2011, pp. 501506.
[5] RS. Scan.Lab, Footscan: An Floor Pressure Sensing Devices Product Introduction /http://www.rsscan.co.uk/systems.phpS.
[6] A. Bertani, A. Cappello, M.G. Benedetti, L. Simoncini, F. Catani, Flat foot
functional evaluation using pattern recognition of ground reaction data,
Clinical Biomechanics 14 (7) (1999) 484493.
[7] P.C. Cattin, D. Zlatnik, R. Borer, Biometric authentication system using human
gait, in: Mechatronics and Machine Vision in Practice, 2001, pp. 14.
[8] P.J. Phillips, S. Sarkar, I. Robledo, P. Grother, K.W. Bowyer, The gait identication challenge problem: data sets and baseline algorithm, in: International
Conference on Pattern Recognition, 2002, pp. 385388.
[9] K.I. Chang, K.W. Bowyer, P.J. Flynn, An evaluation of multimodal 2d 3d face
biometrics, IEEE Transactions on Pattern Analysis Machine Intelligence 27
(2005) 619624.
[10] C. Shan, S. Gong, P.W. McOwan, Fusing gait and face cues for human gender
recognition, Neurocomputing 71 (2008) 19311938.
[11] X. Geng, K. Smith-Miles, L. Wang, Q. Wu, Context-aware fusion: a case study
on fusion of gait and face for human identication in video, Pattern
Recognition 43 (2010) 36603673.
[12] X. Zhang, Z. Sun, T. Tan, Hierarchical fusion of face and iris for personal
identication, in: IEEE International Conference on Pattern Recognition,
2010, pp. 217220.
[13] NSTC, The National Biometrics Challenge, National Science and Technology
Council Subcommittee on Biometrics, vol. 1, 2006, pp. 120.
[14] W.M. Hu, T.N. Tan, L. Wang, S. Maybank, A survey on visual surveillance of
object motion and behaviors, IEEE Transactions on Systems, Man, and
Cybernetics, Part C: Applications and Reviews 34 (3) (2004) 334352.
[15] L. Wang, H. Ning, W. Hu, T. Tan, Gait recognition based on procrustes shape
analysis, in: International Conference on Image Process, 2002, pp. 433436.
[16] I. Bouchrika, M. Nixon, Model-based feature extraction for gait analysis and
recognition, in: International Conference on Computer Vision/Computer
Graphics Collaboration Techniques, 2007, pp. 150160.
[17] C. Chen, J. Liang, H. Zhao, H. Hu, Gait recognition using hidden Markov model,
in: Advanced in Natural Computation, 2004, pp. 399407.
[18] N. Guan, D. Tao, Z. Luo, B. Yuan, Nenmf: an optimal gradient method for
solving non-negative matrix factorization and its variants, IEEE Transactions
on Signal Processing, http://dx.doi.org/10.1109/TSP.2012.2190406, in press.
[19] X. Tian, D. Tao, Y. Rui, Sparse transfer learning for interactive video search
reranking, ACM Transactions on Multimedia Computing, Communications
and Applications, http://dx.doi.org/10.1145/0000000.0000000, in press.

[20] L. Wang, T. Tan, H. Ning, W. Hu, Silhouette analysis-based gait recognition for
human identication, IEEE Transactions on Pattern Analysis and Machine
Intelligence 25 (2003) 15051518.
[21] Z. Liu, S. Sarkar, Simplest representation yet for gait recognition: averaged
silhouette, in: International Conference Pattern Recognition, 2004, pp. 211214.
[22] J. Han, B. Bhanu, Individual recognition using gait energy image, IEEE
Transactions on Pattern Analysis and Machine Learning 28 (2006) 316322.
[23] C. Wang, J. Zhang, J. Pu, X. Yuan, L. Wang, Chrono-gait image: a novel
temporal template for gait recognition, in: European Conference on Computer Vision, 2010, pp. 257270.
[24] N. Ikizler, R. Cinbis, S. Sclaroff, Learning actions from the web, in: IEEE
International Conference on Computer Vision, Kyoto Japan, 2009, pp. 9951002.
[25] N. Dalal, B. Triggs, Histograms of oriented gradients for human detection, in:
IEEE Computer Vision and Pattern Recognition, vol. 1, 2005, pp. 886893.
[26] R.B. Kennedy, Uniqueness of bare feet and its use as a possible means of
identication, Forensic Science International 82 (1) (1996) 8187.
[27] S.P. Moustakidis, J.B. Theocharis, G. Giakas, Subject recognition based on
ground reaction force measurements of gait signals, IEEE Transactions on
Systems, Man, and Cybernetics, Part B: Cybernetics 38 (6) (2008) 14761485.
[28] K. Nakajima, Y. Mizukami, K. Tanaka, T. Tamura, Footprint-based personal
recognition, IEEE Transactions on Biomedical Engineering 47 (11) (2000)
15341537.
[29] A. Uhl, P. Wild, Footprint-based biometric verication, Journal of Electron
Imaging 17 (1) (2008).
[30] H. Lee, R. Grosse, R. Ranganath, A.Y. Ng, Convolutional deep belief networks
for scalable unsupervised learning of hierarchical representation, in: International Conference on Machine Learning, 2009, pp. 7785.
[31] T. Zhang, X. Li, D. Tao, J. Yang, Multimodal biometrics using geometry
preserving projections, Pattern Recognition 41 (2008) 805813.
[32] M. He, S. Horng, P. Fan, Performance evaluation of score level fusion in
multimodal biometric systems, Pattern Recognition 43 (2010) 17891800.
[33] P.F. Felzenszwalb, R.B. Girshick, D. McAllester, D. Ramanan, Object detection
with discriminatively trained part-based models, IEEE Transactions on
Pattern Analysis and Machine Intelligence 32 (2010) 16271645.
[34] D.R. Martin, C.C. Fowlkes, J. Malik, Learning to detect natural image
boundaries using local brightness, color, and texture cues, IEEE Transactions
on Pattern Analysis and Machine Intelligence 26 (2004) 530549.
[35] S. Zheng, K. Huang, T. Tan, Evaluation framework on translation-invariant
representation for cumulative foot pressure image, in: the 18th IEEE International Conference on Image Processing (ICIP), 2011, pp. 201204.
[36] X. Zhou, N. Cui, Z. Li, F. Liang, T.S. Huang, Hierarchical gaussianization for
image classication, in: IEEE International Conference on Computer Vision,
Kyoto, Japan, 2009, pp. 19711979.
[37] D.R. Hardoon, S. Szedmak, J.S. Taylor, Canonical correlation analysis: an
overview with application to learning methods, Neural Computation 16
(2004) 26392664.
[38] J. Shawe-Taylor, N. Cristianini, Kernel Methods for Pattern Analysis, Cambridge University Press, 2004.

Shuai Zheng received his B.Eng. from Beijing Institute of Technology in 2008. He is now a master student from National Laboratory of Pattern Recognition, at the Institute
of Automation, Chinese Academy of Sciences. His current research interests include computer vision and pattern recognition, especially on image representation, image
classication, object detection and gait recognition. He is a student member of the IEEE and the IEEE Computer Society.

Kaiqi Huang received his B.Sc. and M.Sc. from Nanjing University of Science Technology, China, and obtained his Ph.D. degree from Southeast University. He has worked at
the National Laboratory of Pattern Recognition (NLPR), the Institute of Automation, and the Chinese Academy of Science (CASIA) in China and he has been an associate
professor at NLPR since 2005. He is a Senior Member of the IEEE. He is an executive team member of the IEEE SMC Cognitive Computing Committee as well as Associate
Editor of the International Journal of Image and Graphics (IJIG) and Electronic Letters on Computer Vision and Image Analysis (ELCVIA) as well as Guest Editor of a Signal
Processing special issue on Security.
His current research interests include visual surveillance, digital image processing, pattern recognition and biological based vision. He has published over 80 papers in
important international journals and conferences such as IEEE TIPAMI, T-IP, T-SMC-B, TCSVT, Pattern Recognition (PR), Computer Vision and Image Understanding (CVIU),
ECCV, CVPR, ICIP, ICPR. He received the Best Student Paper Awards from ACPR10 and was a prize winner in the detection task in both PASCAL VOC10 and PASCAL VOC11.
He received the honorable mention prize for the classication task in PASCAL VOC11.

Tieniu Tan received the B.Sc. degree in Electronic Engineering from Xian Jiao tong University, China, in 1984 and the M.Sc., DIC, and Ph.D. degrees in Electronic
Engineering from Imperial College of Science, Technology and Medicine, London, UK, in 1986, 1986, and 1989, respectively. He joined the Computational Vision Group,
Department of Computer Science, The University of Reading, England, in October 1989, where he worked as research fellow, senior research fellow, and lecturer. In January
1998, he returned to China to join the National Laboratory of Pattern Recognition, the Institute of Automation of the Chinese Academy of Sciences, Beijing. He is currently a
professor and director of the National Laboratory of Pattern Recognition as well as president of the Institute of Automation. He has published widely on image processing,
computer vision and pattern recognition. His current research interests include speech and image processing, machine and computer vision, pattern recognition,
multimedia, and robotics. Dr. Tan serves as referee for many major national and international journals and conferences. He is an associate editor of Pattern Recognition and
of the IEEE Transactions on Pattern Analysis and Machine Intelligence and the Asia editor of Image and Vision Computing. He was an elected member of the Executive
Committee of the British Machine Vision Association and Society for Pattern Recognition (19961997) and is a founding co-chair of the IEEE International Workshop on
Visual Surveillance.

Dacheng Tao is currently a professor with the Centre for Quantum Computation and Information Systems and the Faculty of Engineering and Information Technology in
the University of Technology, Sydney. He mainly applies Statistics and Mathematics for data analysis problems in data mining, computer vision, machine learning,
multimedia, and video surveillance. He has authored and co-authored more than 100 scientic articles at top venues including IEEE T-PAMI, T-KDE, T-IP, and NIPS, with
best paper awards.