Sie sind auf Seite 1von 24

IET Image Processing

Online Learning on Hierarchical Mixture of Expert for RealTime Tracking


IET Image Processing

Manuscript ID:


Manuscript Type:

Research Paper

Date Submitted by the Author:

Complete List of Authors:


Gu, Song; University of Electronic Science and Technology of China,
Communication and Information Engineering
Ma, Zheng; University of Electronic Science and Technology of China,
Communication and Information Engineering
Li, Yinghong; Naval Aviation, Flight Theory

IET Review Copy Only

Page 1 of 23

IET Image Processing

Online Learning on Hierarchical Mixture of Expert for Real-Time Tracking

Template tracking has been extensively studied in Computer Vision with a wide range of applications.
It is difficult to find the relationship between the observed data set and warping function by one model
because of unobserved heterogeneity of the data set. The heterogeneity will inevitably result in the bad
performance of the tracking. This paper proposes a method based on Hierarchical Mixture of Expert to
perform robust, real-time tracking from stationary camera. Online learning on HME is discussed as
well, which improves robustness during tracking. Moreover, the performance and stability of HME are
demonstrated and evaluated on a set of challenging image sequences.
1. Introduction
Object tracking has many applications in computer vision such as surveillance, vision-based control
and visual reconstruction. Moving objects can be effectively tracked in real-time from stationary
cameras using frame differencing or adaptive background subtraction with simple data association
techniques. In our opinions, object tracking approaches can be regarded as a kind of classification
solutions or regression solutions. In classification solutions, many object-tracking solutions are to find
a feasible classifier to distinguish object and background. [1] presents an online feature selection
mechanism for evaluating in a feature pool. The criterion of feature selection is to minimize the
variance within the classes and maximize the one between the classes, which is similar to Fisher linear
discriminant. [2] presents another feature selection mechanism which is called online AdaBoost feature
selection method. The above approaches mainly focus on the selection of positive samples, and almost
ignore the selection of negative samples. [3] presents an effective method to choose positive and
negative examples. And [4] focuses on the same solution when the tracker location is not precise. In
both approaches, classification criteria is established by mixture models to distinguish the object and
the background effectively in most commonly condition. In regression solutions, many object tracking
solutions are to find the differences of frame sequence, and estimate the motion model. For example, in
template tracking approaches, the main task of these are to find the relationship between the difference
of the two consecutive images and the parameters of the template warping function, and to follow a
template in an image sequence by estimating parameters of the template warping function. [5] and [6]
are simple linear regression solutions. [7] is nonlinear regression solution. Three approaches above
result in errors caused by the unobserved heterogeneity of the ground truth model. Although the latter
performs an efficient second-order approximation of the image error and achieves a high convergence
rate, it is a time consuming process since the complexity of the proposed model.
In this paper, we proposed mixture linear models to find the relationship between the difference of the
two consecutive images and the warping function. Related works have been studied such as [8] and [9].
Both approaches provide robust and real-time tracking solutions by mixture models. The difference
between both approaches and our paper is that, the former uses Gaussian mixture models and the latter
uses the linear mixture models. The advantage of using mixture models is that mixture models do not
require to specify the distribution apriori but allow to be approximate to the parameters in a data-driven
way. However, offline learning of mixture models restricts the application on real-time tracking. In this
paper, we simplify the method proposed in [10]. Inspired by [11] and [12], an online learning strategy
is proposed too. The update equation for the inverse covariance matrix updates the inverse matrix
directly. There is never a need to invert matrices during online updating. The performance of online
learning will be evaluated in experiments as well.
The rest of this paper is organized as follows: Sec.2 introduces the terminology of offline HME

IET Review Copy Only

IET Image Processing

Page 2 of 23

learing. Sec.3 presents an online algorithm. Sec.4 discusses our implementation on tracking strategy. In
Sec.5, robustness and parameter selection are compared by experiments. The papers observations are
wrapped up in conclusions.
2. Offline Hierarchical Mixture of Expert
The algorithm discussed in this paper is supervised learning algorithm. We explicitly address the case
of regression. The algrithm HME proposed in this paper is similar to the one of [10]. We assume that
the input vectors are element of

l and the output vectors are elements of p . The data are

assumed to form a countable set of paired observation

number of paired data,

{ xn

yn } n = 1, 2,..., N , where N is the

xn and yn are column vectors, whose size are l 1 and p 1

respectively. We propose to solve a nonlinear supervised-learning problem by dividing the input space
into a set of regions and fitting independently the data that fall in these regions by hyperplanes.
Obviously, this method performs better than fitting a simple surface to all the data. As proposed in [10],
the regions have soft boundaries, meaning that data points may lie simultaneously in multiple regions.
The boundaries between regions are themselves simple parameterized surfaces that are adjusted by the
learning algorithm. In each region, we fit the data that fall in one region by a hyperplane whose

Am , and we adopt softmax function as the boundary. Suppose M is the number of

parameter is

hyperplanes, which we call experts, when given an input vector x, the predict of y is formulated by

y = m ( x) Am x


m =1

where the mixing coefficients

m ( x)

are known as gating function, which is a function of input

vector x, and is formulated by softmax function as


i ( x) =

evi x

i = 1,..., M


vTm x

m =1


vi is the parameter of each gating function. Obviously,

( x) = 1 and

m =1

1 i = k
= i ( x)( ik k ( x)) x where ik =
0 i k
We can view a mixture of experts as a belief network, which is illustrated in Fig.1. The input vector
is connected to all experts and gating vertices. Each expert is represented by the individual component
densities, and the gating vertice is the mixing coefficients of each component. Adopted by Gaussian













p ( y | ) = m ( x) N ( y | Am x, 1 ) , where denotes the set of all adaptive parameters in the

m =1

IET Review Copy Only

Page 3 of 23

IET Image Processing

Am , m ( x) where m = 1, 2,..., M and 1 (covariance matrix). Note that

model, namely

m ( x)

x but also the parameter vm . Given a set of

is a function of not only the parameter


{ xn

yn } n = 1, 2,..., N , the log likelihood function for this model, then takes the


n =1

m =1

ln p (Y | ) = ln( m ( xn ) N ( yn | Am xn , 1 )) .
In order to maximize this likelihood function, we can introduce a set
variables, where

Z = { znm } of binary latent

znm {0,1} in which, for each data pair n, all of the elements m = 1,...M are

zeros except for a single value of 1. This latent variable has an interpretation as the label that specified

znm were known, then the maximum likelihood problem would

the expert in the probability model. If

decouple into a separate set of regression problems for each expert and a separate set of classification
problems for the gating. Although, the latent variables are not known, we can specify a probability
model that links them to the observable data. For each observation

can be written as

{ xn , yn } , the probability model

p ( yn , znm | ) = { m N ( yn | Am xn , 1 )}


. Noting that

znm is an

m =1

indicating variable, then the complete-data log likelihood function then takes the form

ln p (Y , Z | ) = znm ln { m N ( yn | Am xn , 1 )} .
n =1 m =1

The EM algorithm is a general estimation solution in the presence of latent variable. Theory
related has been reported in [13]. The main conclusions will be given in the rest of this section. For a
sample of N individuals, the EM algorithm is given by:
E-step: Given the current parameter estimates, we replace the latent variables by the estimated posterior

nm = E[ znm ] = p ( znm | xn , )

p ( znm = 1) p ( xn | znm = 1)

p( z


= 1) p ( xn | zni = 1)

i =1

Corresponding to it,

m ( xn ) N ( yn | Am xn , 1 )

(x )N ( y


| Ai xn , )

i =1

m ( xn )

is called priori probability. Noting that

probability of an expert is valid for regression, then

p ( znm = 1) indicates the

p ( znm = 1) = m ( xn ) ,

p ( x | znm = 1) = N ( yn | Am xn , 1 ) .

IET Review Copy Only

IET Image Processing

M-step: Given the estimates for posterior probabilities

Page 4 of 23

nm , the solutions of all parameters such as

Am , vm and 1 are as follows:

nm ln { m ( xn ) N ( yn | Am xn , 1 )}

n =1 m =1



vm = arg max nm ln m (vm , x)


n =1 m =1

Setting the derivatives of (4) with respect to the



Am to zero, we obtain

( yn Am xn ) xn = 0


n =1

Assuming that matrix

H = [ x1 , x2 ,..., xN ] and Y = [ y1 , y2 ,..., yN ] , we can rewrite (4) in matrix

notation as (Y Am H ) Rm H

= 0,

Where Rm = diag ( nm ) is a diagonal matrix of size N N . Solving for Am , we obtain

Am = YRm H T ( HRm H T )1


Setting the derivatives of (4) with respect to the



to zero, we obtain

( yn Am xn ) 2


n =1 m =1

Setting the derivatives of (5) with respect to the vm and noting that


is fixed when solving

derivatives , we obtain


m (vm , xn )) xn = 0


n =1

Both Equation (6) and (9) have similar forms. It should be noticed that both equations for each expert
can be solved independently. Equation (6) can be viewed as weighted least-squares problem with

{ xn

yn } and observation weights { nm } . yn is the target for each xn . Equation

(9) can be viewed as least-square problem with observations

{ m (vm , xn ) nm } . Specially in

Equation (9), we can compute vm by inverting the softmax function. Thus the targets for vm is

ln nm ln evi xn . However,
i =1

vi xn

is common to all of vm , it can be omitted when vm is

i =1

IET Review Copy Only

Page 5 of 23

IET Image Processing

converted to

m . Thus, ln nm

can be viewed as the target for each xn . Constructing matrix

m = [ln 1m , ln 2 m ,..., ln Nm ] for each expert by Equation (3), we can obtain

vm = m H T ( HH T )1


Exper t 1

Exper t 2


Exper t M

gat i ng

Fig.1 Hierarchical Mixture of Expert
From the algorithm, the iterative processing of EM algorithm needs an initial value. Namely, the
difference between mixture model and single model is that, the former needs an initial value of the
parameters, but the latter needs not. Theoretically, although the performance of HME is better than one
linear model solution such as [5] and [6] especially in nonlinear regression, both the expert parameters
and the gating parameters influence each other. Incorrect initial value of both parameters result in bad
performance of regression. From the simple numerical application proposed in [5], It is observed that
the relationship of input and output is linear around the origin, and the degree of nonlinear increases
away from the origin. In our approach, we straightforwardly divide the input space into M subspace
according to the magnitude of

x . vm can be initialized correspondingly as well. Each parameters

is computed independently in each subspace as the initial value of the iteration. In addition, the trick
avoids the uncertainty when selecting paramters randomly as initial value. Algorithm 1 formalizes the
steps of Offline HME learning approach.
Algorithm 1 Offline HME Learning
Function Learning(in H, in Y, in M, out Am , out vm , out

Compute the normal of each column of matrix H, and divide them into M subspaces according to their
Initialize matrix Am independently in each subspace using Least Squares algorithm.
Initialize matrix

m ( xn ) for each data according to the divided subspace

Compute the covariance matrix of the input data set

for iteration = 1,2, do
Compute the posterior probabilities


for each data pair and each expert by Eq.(3)

IET Review Copy Only

IET Image Processing

Page 6 of 23

Update the parameters of gating function vm by Eq.(10)

Update the parameter of expert Am by Eq.(7)

Update the parameter

by Eq.(8)

3. An online algorithm
In real-time tracking solution, the model can be updated successively when a new frame comes. This is
called online algorithm. Recall the derivation of Eq.(7), the problem of predicting Am is actually a
weighted least squares problem. This can be converted into an online algorithm proposed in [12]. We
can update the parameters with each successive data point by recursive procedures.
The online update rule for the parameters of the experts is given by the following recursive

Am (t ) = Am (t 1) + [ y (t ) Am (t 1) x(t )] LTm1 (t )
Lm1 (t ) =

Pm1 (t 1) x (t )

m (t )

Pm1 (t ) =


+ x (t ) Pm1 (t 1) x(t )

1 Pm1 (t 1) Pm1 (t 1) x(t ) xT (t ) Pm1 (t 1)

m (t )

+ xT (t ) Pm1 (t 1) * x(t )

Am (t ) is updated parameter, and Am (t 1) is previous one. x(t ) and y (t ) are new data pair.

m (t )

is a posterior probability computed by Eq.(3).

exponentially less weight to older samples, where

is the forgetting factor which gives

0 < 1 . In our approach, = 0.998 .

Recall the derivation of Eq.(10), the problem of predicting

vm is least squares problem similarly. The

online update rule for the parameters of the gating function is given by the following recursive equation

vm (t ) = vm (t 1) + [ ln m (t ) vm (t 1) x(t )] LTm 2 (t )
LTm 2 (t ) =

Pm 2 (t 1) x(t )
+ xT (t ) Pm 2 (t 1) x(t )


1 Pm 2 (t 1) Pm 2 (t 1) x(t ) xT (t ) Pm 2 (t 1)
Pm 2 (t ) =

+ xT (t ) Pm 2 (t 1) x(t )
For real-time tracking solution, we usually adopt offline learning algorithm to initialize expert
parameters and gating function, and online learning algorithm to update them. When updating, Both

Pm1 (t ) and Pm 2 (t ) should be initialized as the following equtions.

Pm1 (0) = ( HRm H T ) 1


Pm 2 (0) = ( HH T )1


IET Review Copy Only

Page 7 of 23

IET Image Processing

In our approach,

is only estimated by offline algorithm. It should be also noted that each

parameter of expert and gating function is updated independently in online algorithm. Considering a
new data

x(t ) is coming, it should be firstly determined the different weight in each subspace by

gating function of previous parameters. Then updating parameters of related experts and gating
functions is necessary, and the other parameters need not update. In the algorithm implementation
process, we usually find that the posterior probabilities with respect to a new data are all zeros.
However, its prior probabilities indicates clearly that it belongs to one subspace. We replace the
posterior probabilities with priori probabilities in this moment.
In our opinion, regression solution by single model such as hyperplane regression proposed in [5]
and [6] retrieves the local information of the entire model. Because of the nonlinear of the ground truth
model, the hyperplane parameters are updated constantly when adopting online algorithm. Accordingly,
regression solution of mixture model such as our approach retrieves the global information of the entire
model. Only the parameters in related subspaces are updated at a time, and the other parameters is not.
However, the precondition of this is that the parameters of gating function can divide data space into
subspace roughly in offline learning stage. This condition can be achieved when collecting samples on
offline learning stage.
4. Template tracking by HME
In this section, we present our notations and template tracking application by HME. We adopt almost
identical notations as those proposed by [5] [13] [14]in order to make the reading easier.
4.1 Template model
We select a rectangular region in the first frame of a video sequence without loss of generality, which
defines the region of interest as target region that we want to track. The location of the target region in

I ( R, t ) is a

an image is defined by R which stores the position of the four corner points. Suppose
vector of the brightness values of

p sample points within target region instead of all the pixels in it

at time t. Note that R is different at each time. In this paper,

I ( R, t ) is an p 1 vector. Then

I ( R, t0 ) is the brightness values of target region in the first frame, which is defined as the template, t0
is the initial time. The relative motion between the object and the camera induces change in the position
of the template in the image. The transformations can be modeled by a parametric motion

f ( R, t ) , where t denotes a set of parameters at time t. In our implementation, homograph is

used as the transformations. So t is an 81 vector. We define

I ( f ( R, t ), t ) as the brightness

values of transformed target region because of the relative motion, and

I ( f ( R, t ), t0 ) as the

brightness values of initial transformed target region to template. With the assumptions, tracking the
object at time t means compute

such that

I ( f ( R, t ), t0 ) = I ( f ( R, t ), t ) .

4.2 HME Approximation

Given a target region in the first frame, the corresponding transformations and brightness values are

IET Review Copy Only

IET Image Processing

stored in


Page 8 of 23

I ( f ( R, t ), t0 ) , respectively. From [5], we can simplify the relation as


= A i

are the template parameter updates,

= t t

i = I ( f ( R, t ), t0 ) I ( f ( R, t ), t )

The key to track object correctly is to find a suitable hyperplane approximation matrix A.
However, from simple numerical application in [5], we know that the relation is not linear completely.
It will result in a larger error, especially in large-motion condition, if we solve this problem by a single
linear model in isolation. In addition, we have difficulties in finding an appropriate nonlinear model to
regress the predictor. But, it is often found that improved performance can be obtained by combining
multiple models together. We consider M linear model components, each governed by its own predictor

Am . The warping parameter updates


= m ( i ) Am i


m =1


can be viewed as input vector, and

can be viewed as output vector.


The offline learning process uses N random transformations

on the template, where


p . These transformations are small disturbances i = t t , i=1,2,, N to the initial


transformed parameters. As a consequence, this introduces a change in the image brightness values to


ii , where ii = I ( f ( R, t )) I ( f ( R, t )) , i=1,2,, N.




Y = [1 , 2 ,... N ] = [ y1 , y2 ,... yN ]


H = [ i1 , i2 ,... iN ] = [ x1 , x2 ,...xN ] , where H is an p N matrix, Y is an 8N matrix. The

parameters of each expert and gating function will be estimated as the initial value of online algorithm.
The online learning process is a way of adding new training samples to the learned parameters during
tracking. The number of update samples corresponds to the number of random transformations applied
to the template. Note that this is the same template as used for the initial learning. The delta of object
motion parameter is computed by Eq.(15) during tracking process. [5] has proposed in detail the
approach of updating the transformation parameter

t .

5. Experiments
In this section, we evaluate our proposed approach for efficient HME approximation learning and
tracking by comparing these to the original one proposed by [5]. These comparisons are done by using
evaluation of accuracy. This involves the tracking robustness with respect to different types of motion
as well as to noise.

IET Review Copy Only

Page 9 of 23

IET Image Processing

For the implementation of [5], we programmed binaries by ourselves based on [15]. Homography
warping algorithm is from [15], which is publicly available. To improve invariance to illumination
changes, normalization is used, that is slightly different to original Hyperplane Approximation
algorithm proposed by [5]. All of the algorithm and our proposed algorithm are implemented in
MATLAB 7.10.0.
The evaluation of these algorithms are conducted using notebook with a Inter(R) Core(TM)2 Duo
CPU T6500 @2.10GHz 2.10 GHz, and 1.86G of RAM.
5.1 Offline algorithm
In this section, we analyze the influence of our offline approach on the robustness of tracking with
respect to different movements. In addition, the number of components in mixture model is analyzed
by experiments. We measure accuracy by finding the correct location of the template after inducing
random transforms to several test images. The images used in the evaluation are taken from the [15].
For measuring the robustness we corrupted the image with noises sampled from a Gaussian distribution
before applying the random transform. Comparison is measured by the sum of four corner points
distance to the benchmark which is called error distance in this paper.

i =1

It is formulated by

uuv uuv
xti xgi , where xti is the one of the four tracked corner points coordinate, and xgi is the

one of the benchmark. Fig.(2) shows the sample image and the target region we want to track. The
target region size is

150 150 pixels. Subsample is applied to the target region, and the step of
i is 225.

subsample is 10. Then the length of the input vector

Fig.2 Image and target region

A comparison among the different methods with respect to the same learning data set is presented
in Fig.(3). The estimated parameters are learned by random transformations to the each corner of target
region. The random transformation is uniformly distribution between -5 and 5 pixels. The number of
learning data set is 2000. When collecting learning data sets, Gaussian distribution noise is induced to
sample image as well. When tracking, iteration is used for predictors in order to track precisely which
is described in [5]. The number of evaluated data set is 1000. The number of interation is 5 and the
number of mixture components is 4.

IET Review Copy Only

IET Image Processing

Page 10 of 23

Fig.3 Comparison of Offline learning

Fig.3 shows that our approach is more stable than [5]. Regression solution by single model can
easily be disturbed by nonlinear and noise. Mixture model divides the nonlinear space into some linear
subspace, and each component stands for each subspace independently. In this sense, the mixture
model retrieves the global information more accurately than single model, especially in nonlinear
condition. This is the most important motivation of our approach.
As described in algorithm 1, iteration is used in learning. And the ground truth model is nonlinear, the
number of mixture components is also a key parameter. Both parameters affect essentially the
performance of the regression. In this paper, the experiment is accomplished to verify the offline
learning performance, which is measured by the likelihood of all the learning data set. According to the
proposed input-space dividing solution, the number of learning data in one space must not be less than
the length of i . In this experiment, we illustrate the convergence rate of likelihood with respect to
different number of the iteration and the component.

Fig.4 convergence rate of likelihood with respect to different number of mixture component
The likelihood increases along with the increasing of the number of components. However, the larger
number of components will result in larger learning data and computing time. In this paper, the number
of components is 4. From Fig.4, the likelihood almost converges after iteration number reaches 4
regardless of the component number. In this paper, the number of iteration is 5.
5.2 Online algorithm
In online algorithm, the expert and gating function are updated by the new sample data, which are
obtained from new random transformations applied to the interest region. Both the forgetting factor
and the number of update samples are key parameters. Larger
older values slower and smaller

means that the algorithm forgets

reverses. In this experiment, we add new training samples to

IET Review Copy Only

Page 11 of 23

IET Image Processing

learned predictors by offline algorithm during tracking. The tracking performance is measured by the
mean of error distance with respect to different forgetting factor and update sample number. The test
images are obtained from successive movements of the sample image showed in Fig.5. There are 100
frames in total. The quadrilateral in figure is the benchmark of the tracked region. From Fig.6,

= 0.998 results in the smallest error distance in this experiment. Fig.7 shows the corresponding
improvement in tracking performance with increasing number of update samples.

Fig.5 successive movements of the sample image(left:frame 50th, right:frame 100th )

Fig.6 number of update samples vs. error distance

Fig.7 the influence of forgetting factor in object tracking

Fig.8 depicts the tracking performance in four different types of motions such as view angle, scale,
rotate and translation. The noise is induced in every motion. From Fig.8, especially in scale, the poor

IET Review Copy Only

IET Image Processing

performance of algorithm proposed in [5] probably shows the significant nonlinearities of test data set.
In other experment such as view angle, rotate and translation, the performance of our approach and [5]
are unanimous. However on the whole, our approach performs better in four different motions since it
considers both the characteristics of linear and the one of nonlinear.

Fig.8 Comparison of the three approach: [5] and our fast learning approach. We consider four different
types of motions as specified
[5] and our approach are compared each other in above successive movements of the sample image.
Fig.9 shows that our approach improves the tracking performance.

Fig.9 Comparison of [5] and our Online algorithm in successive movements

6. Conclusion
To the best of our knowledge, our approach is the first discussion on homography-based tracking by
HME. In this paper, we have shown an original improvement of the tracking algorithm. The key idea is
to regress the ground truth model by mixture hyperplanes instead of a single one. Single model

IET Review Copy Only

Page 12 of 23

Page 13 of 23

IET Image Processing

regression only retrieves the local information of the entire model at a time. However, mixture model
can retrieve the global information. Mixture models do not require to specify this distribution apriority,
and the result can eliminate the error caused by unobserved heterogeneity. By experiments, we
illustrate that mixture model regression achieves better tracking performance than single model
regression. Moreover, online algorithm updates the mixture model during tracking, which can meet the
practical needs. Furthermore, since our approach needs a lot of iteration, the selection of initial values
in learning and tracking stage is analyzed, which reduces the randomness of the computation results.
In this paper, the number of experts is fixed in advance. If we make a further generalization to
allow adaptive expert number to depend on the input data, the approach will become a greater result
than original one. And infinit mixture model, the solution of identification proposed in [16] is also the
key problem. Both will be made further research for more to come.
[1] Robert T. Collins, Yanxi Liu and Marius Leordeanu, Online Selection of Discriminative Tracking
Features, IEEE Transactions on Pattern Aanlysis and Machine Intelligence, vol. 27, No. 10,
pp.1631-1643,October 2005
[2] Helmut Grabner and Horst Bischof, On-line Boosting and Vision, Computer Vision and Pattern
Recognition, 2006 IEEE Computer Society Conference, vol.1, pp.260-267, June 2006
[3] Boris Babenko, Ming-Hsuan Yang and Serge Belongie Visual Tracking with Online Multiple
Instance Learning, Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference ,
pp.983 990, June 2009
[4] Zdenek Kalal, Jiri Matas and Krystian Mikolajczyk, Online learning of robust object detectors
during unstable tracking, Computer Vision Workshops (ICCV Workshops), 2009 IEEE 12th
International Conference, pp.1417 1424, September 2009
[5] Frederic Jurie, and Michel Dhome, Hyperplane Approximation for Template Matching, IEEE
Transactions On Pattern Analysis And Machine Intelligence, Vol. 24, No. 7, July 2002
[6] Stefan Holzer, Marc Pollefeys, Slobodan Ilic, David Tan, and Nassir Navab, Online Learning of
Linear Predictors for Real-Time Tracking, 12th European Conference on Computer Vision, Part I,
pp.470 483, October 2012
[7] S. Benhimane and E. Malis, Homography-based 2d visual tracking and servoing, The
International Journal of Robotics Research, Vol. 26, No. 7, pp. 661676, July 2007
[8] Yogesh Raja, Stephen J . McKenna and Shaogang Gong, Segmentation and tracking using colour
mixture models, Third Asian Conference on Computer Vision, Vol I, pp. 607 614, January 1998
[9] Stephen J. McKenna, Yogesh Raja and Shaogang Gong, Tracking colour objects using adaptive
mixture models, Image and Vision Computing, Vol 17,

Issues 34, pp. 225231 , March 1999

[10]Jordan M I, Jacobs R A. "Hierarchical mixtures of experts and the EM algorithm". Neural

computation, 1994, 6(2): 181-214.
[11]B.D.RIPLEY, Pattern Recognition and Neural Networks. pp.283-286, University of Oxford, 1966
[12]Ljung L, Sderstrm T. Theory and practice of recursive identification. pp17-21, The MIT Press
Cambridge, Massachusetts London, England, 1983.
[13] Christopher M. Bishop F.R.Eng, Pattern Recognition and Machine Learning. pp.667-670,
Springer Science+Business Media, LLC. 2006
[14]G.D. Hager and P.N. Belhumeur, Efficient Region Tracking with Parametric Models of Geometry
and Illumination, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 10, pp.
1025-1039, Oct. 1998.

IET Review Copy Only

IET Image Processing

[15] Simon Baker and Iain Matthews, Lucas-Kanade 20 Years On: A Unifying Framework,
International Journal of Computer Vision, Vol 56, Issue 3 , pp 221-255, February 2004
[16] Bettina Grn and Friedrich Leisch , Finite Mixtures of Generalized Linear Regression Models,
Recent Advances in Linear Models and Related Areas, pp. 205-230, 2008

IET Review Copy Only

Page 14 of 23

Page 15 of 23

IET Image Processing

82x52mm (96 x 96 DPI)

IET Review Copy Only

IET Image Processing

130x129mm (96 x 96 DPI)

IET Review Copy Only

Page 16 of 23

Page 17 of 23

IET Image Processing

132x104mm (96 x 96 DPI)

IET Review Copy Only

IET Image Processing

135x107mm (96 x 96 DPI)

IET Review Copy Only

Page 18 of 23

Page 19 of 23

IET Image Processing

155x75mm (96 x 96 DPI)

IET Review Copy Only

IET Image Processing

135x103mm (96 x 96 DPI)

IET Review Copy Only

Page 20 of 23

Page 21 of 23

IET Image Processing

136x104mm (96 x 96 DPI)

IET Review Copy Only

IET Image Processing

155x116mm (96 x 96 DPI)

IET Review Copy Only

Page 22 of 23

Page 23 of 23

IET Image Processing

132x104mm (96 x 96 DPI)

IET Review Copy Only