Sie sind auf Seite 1von 3

112 IEEE SIGNAL PROCESSING LETTERS, VOL. 4, NO.

4, APRIL 1997

Infomax and Maximum Likelihood


for Blind Source Separation
Jean-François Cardoso, Member, IEEE

Abstract—Algorithms for the blind separation of sources can


be derived from several different principles. This letter shows
that the recently proposed infomax principle is equivalent to
maximum likelihood.
Fig. 1. Mixing, unmixing, and nonlinear transformation.

I. INTRODUCTION
Plain maximization would be inappropriate because the
S OURCE separation consists in recovering a set of unob-
servable signals (sources) from a set of observed mixtures.
In its simplest form, an 1 vector of observations
entropy of diverges to infinity for an arbitrarily
large separating system . Thus, the infomax principle is
implemented by maximizing with respect to the entropy
(typically, the output of sensors) is modeled as
of where
(1)

where the “mixing matrix” is invertible and the 1


vector has independent components: Its is a componentwise nonlinear function. Thus, the
probability density function (pdf) factors as infomax contrast function is
(2) (3)

where is the pdf of .1 Based on these assumptions where is the (Shannon) differential entropy (11). Scalar
and on realizations of , the aim is to estimate matrix or, functions are taken to be “squashing functions,”
equivalently, to find a “separating matrix” such that mapping the real line to the interval and monotonously
increasing. Thus, if is differentiable, it is the cumulative
distribution function (cdf) of some pdf on the real line
is an estimate of the source signals.
A guiding principle for source separation is to optimize a
function called a contrast function, which is a function of
the distribution of [4]. Based on the infomax principle,
an apparently new contrast function has recently been derived
by Bell and Sejnowski [1], attracting a lot of interest. In
Denote an 1 random vector with pdf
this letter, we exhibit the contrast function associated to the
well established maximum likelihood (ML) principle. By this
device, we show that, regarding source separation, infomax (4)
principle boils down to ML.

II. INFOMAX Then, is distributed uniformly on since is the


cdf of . Thus, is distributed uniformly on
According to Bell and Sejnowski, application of the infomax
and the infomax contrast is rewritten as
principle [1] to source separation consists in maximizing an
output entropy (see Fig. 1).
(5)
Manuscript received July 9, 1996. The associate editor coordinating the
review of this manuscript and approving it for publication was Prof. T. S.
Durrani.
where denotes the Kullback–Leibler (KL) divergence
The author is with the Centre National de la Recherche Scientifique and (12). The first equality is the combination of (3) and (14);
École Nationale Supérieure des Télécommunications, 75634 Paris, France (e- the second results from (13). It shows that “infomaximiza-
mail: cardoso@sig.enst.fr).
Publisher Item Identifier S 1070-9908(97)02520-0.
tion” is identical to minimization of the KL divergence be-
1 Throughout this letter, pdf’s are with respect to Lebesgue measure. All tween the distribution of the output vector and the
distributions are assumed continuous with respect to it. distribution (4) of
1070–9908/97$10.00  1997 IEEE
CARDOSO: BLIND SOURCE SEPARATION 113

III. MAXIMUM LIKELIHOOD With the correct source model, i.e., when is distributed
We first recall how the ML principle is associated with as , the contrast reduces to . This is
a contrast function. This is then specialized to the source maximized at because , which is the lowest
separation model. possible value, the KL divergence being nonnegative.
Consider a sample of independent realizations What happens with a wrong source model
of a random variable distributed according or—equivalently—when the squashing functions are
to a common density . Let be a not the cdf’s of the source distributions? We sketch an
parametric model for the density of . The likelihood that answer, based on the following functions:
the sample is drawn with a particular distribution is the
product . Taking the logarithm and dividing
by the number of observations results in the normalized
log-likelihood
The stationary points of the likelihood/infomax contrast cancel
the gradient of . Differentiating , it is easily found that these
are the matrices such that verifies

Since this is the sample average of , it converges in E (9)


probability, by the law of large numbers, to its expectation
1 if and 0 otherwise. Let be a solution of
E
E (10)
Setting in the above yields
(6) Then, matrix diag is a stationary point of
if the source signals have zero mean: 0. This is
Note that this conclusion is reached without assuming an exact easily seen because the stationarity condition (9) is satisfied
model, i.e., it is not assumed that for some . for thanks to (10) and for due to
This result applies to source separation as follows. The
distribution is the true distribution of , i.e., the distribution E E E
of vector ; the parameter of the parametric model is the
unknown mixing matrix: ; the parameter set is the
The first equality is by independence of the source signals;
set of all invertible matrices; the pdf’s in are the
the second one by the zero-mean condition. For a wrong
distributions of vector where the source vector is assumed
source model: , one generally has 1 and thus
to be distributed as , i.e., according to distribution (4).
diag . However, such a still is a satis-
In this setting, (6) becomes
factory solution with respect to source separation, since source
: The contrast function associated with the likelihood
signals are recovered up to scaling factors. Unfortunately, one
appears to be
cannot conclude that the likelihood/infomax contrast is com-
(7) pletely “robust” to misspecifying source distributions because
too large a mismatch may turn diag into an
because being an additive constant (not depending on unstable stationary point, as observed in [1].
the parameter ) can be discarded.

IV. DISCUSSION V. CONCLUSION


In view of (5) and (7), the function R R The infomax principle was shown to coincide with the ML
principle in the case of source separation. Both principles
(8) explicitly or implicitly assume given source distributions. They
consist in minimizing the Kullback divergence between the
appears to play a central role. Indeed, we can write distribution at the output of a separating matrix and the
hypothesized source distribution. It thus appears important to
match source distributions as closely as possible, as already
noted in [1] and as is obvious from the ML standpoint.
The first equality stems from (5) and . The second Even with a wrong source model, a stationary point of the
equality is by applying property (13) to (7) with . contrast is still obtained for separated (but generally scaled)
Hence, it is found that the contrasts associated with info- signals, even though the stability cannot be guaranteed for too
max and with maximum likelihood coincide, provided is large a mismatch. Both principles also hint at a more general
identified to . Interpretation is straightforward in either approach involving joint estimation of the mixing matrix and
case: Contrast optimization corresponds to a “Kullback match- of (some characteristics of) source distributions. First steps in
ing” between the hypothesized source distribution and the this direction are taken in [1]; an elegant and well-developed
distribution of where is either or . approach is to be found in [5].
114 IEEE SIGNAL PROCESSING LETTERS, VOL. 4, NO. 4, APRIL 1997

APPENDIX bution and the uniform distribution on . Clearly


INFORMATION THEORETIC QUANTITIES
• The differential entropy of a variable with pdf is denoted
; the Kullback–Leibler divergence between two pdf’s
and is denoted . Definitions are

(11)
(14)
(12)
ACKNOWLEDGMENT
whenever the integrals exist. A convenient abuse of This work was completed while the author was visiting S.
notation is I. Amari at Riken Institute of Japan.

REFERENCES
[1] A. J. Bell and T. J. Sejnowski, “An information-maximization approach
if(resp. ) is the density of a random vector (resp. ). to blind separation and blind deconvolution,” Neural Comput., vol. 7,
• with equality iff and agree -almost no. 6, pp. 1004–1034.
[2] J.-F. Cardoso, “The equivariant approach to source sepa-
everywhere. ration,” in Proc. NOLTA, 1995, pp. 55–60. Available as
• The KL divergence is invariant under an invertible trans- {ftp://sig.enst.fr/pub/jfc/Papers/nolta95.ps.gz}.
formation of the sample space, as follows: [3] T. M. Cover and J. A. Thomas, Elements of Information Theory: Wiley
Series in Telecommunications. New York: Wiley, 1991.
[4] P. Comon, “Independent component analysis, a new concept?,” Signal
(13) Processing, vol. 36, no. 3, pp. 287–314, 1988.
[5] D.-T. Pham, P. Garrat, and C. Jutten, “Separation of a mixture of
• The differential entropy of a distribution with support independent sources through a maximum likelihood approach,” in Proc.
also is the KL divergence between this distri- EUSIPCO, 1992, pp. 771–774.

Das könnte Ihnen auch gefallen