Beruflich Dokumente
Kultur Dokumente
4, APRIL 1997
I. INTRODUCTION
Plain maximization would be inappropriate because the
S OURCE separation consists in recovering a set of unob-
servable signals (sources) from a set of observed mixtures.
In its simplest form, an 1 vector of observations
entropy of diverges to infinity for an arbitrarily
large separating system . Thus, the infomax principle is
implemented by maximizing with respect to the entropy
(typically, the output of sensors) is modeled as
of where
(1)
where is the pdf of .1 Based on these assumptions where is the (Shannon) differential entropy (11). Scalar
and on realizations of , the aim is to estimate matrix or, functions are taken to be “squashing functions,”
equivalently, to find a “separating matrix” such that mapping the real line to the interval and monotonously
increasing. Thus, if is differentiable, it is the cumulative
distribution function (cdf) of some pdf on the real line
is an estimate of the source signals.
A guiding principle for source separation is to optimize a
function called a contrast function, which is a function of
the distribution of [4]. Based on the infomax principle,
an apparently new contrast function has recently been derived
by Bell and Sejnowski [1], attracting a lot of interest. In
Denote an 1 random vector with pdf
this letter, we exhibit the contrast function associated to the
well established maximum likelihood (ML) principle. By this
device, we show that, regarding source separation, infomax (4)
principle boils down to ML.
III. MAXIMUM LIKELIHOOD With the correct source model, i.e., when is distributed
We first recall how the ML principle is associated with as , the contrast reduces to . This is
a contrast function. This is then specialized to the source maximized at because , which is the lowest
separation model. possible value, the KL divergence being nonnegative.
Consider a sample of independent realizations What happens with a wrong source model
of a random variable distributed according or—equivalently—when the squashing functions are
to a common density . Let be a not the cdf’s of the source distributions? We sketch an
parametric model for the density of . The likelihood that answer, based on the following functions:
the sample is drawn with a particular distribution is the
product . Taking the logarithm and dividing
by the number of observations results in the normalized
log-likelihood
The stationary points of the likelihood/infomax contrast cancel
the gradient of . Differentiating , it is easily found that these
are the matrices such that verifies
(11)
(14)
(12)
ACKNOWLEDGMENT
whenever the integrals exist. A convenient abuse of This work was completed while the author was visiting S.
notation is I. Amari at Riken Institute of Japan.
REFERENCES
[1] A. J. Bell and T. J. Sejnowski, “An information-maximization approach
if(resp. ) is the density of a random vector (resp. ). to blind separation and blind deconvolution,” Neural Comput., vol. 7,
• with equality iff and agree -almost no. 6, pp. 1004–1034.
[2] J.-F. Cardoso, “The equivariant approach to source sepa-
everywhere. ration,” in Proc. NOLTA, 1995, pp. 55–60. Available as
• The KL divergence is invariant under an invertible trans- {ftp://sig.enst.fr/pub/jfc/Papers/nolta95.ps.gz}.
formation of the sample space, as follows: [3] T. M. Cover and J. A. Thomas, Elements of Information Theory: Wiley
Series in Telecommunications. New York: Wiley, 1991.
[4] P. Comon, “Independent component analysis, a new concept?,” Signal
(13) Processing, vol. 36, no. 3, pp. 287–314, 1988.
[5] D.-T. Pham, P. Garrat, and C. Jutten, “Separation of a mixture of
• The differential entropy of a distribution with support independent sources through a maximum likelihood approach,” in Proc.
also is the KL divergence between this distri- EUSIPCO, 1992, pp. 771–774.