Sie sind auf Seite 1von 10
Signal Processing 73 (1999) 267 — 276 Blind separation of sources using higher-order cumulants Bin-Chul

Signal Processing 73 (1999) 267276

Signal Processing 73 (1999) 267 — 276 Blind separation of sources using higher-order cumulants Bin-Chul Ihm,

Blind separation of sources using higher-order cumulants

Bin-Chul Ihm, Dong-Jo Park*

Department of Electrical Engineering, Korea Advanced Institute of Science and Technology, 373-1 Kusong-dong, Yusong-gu, Taejon, 305701, South Korea

Received 26 October 1997; received in revised form 12 October 1998

Abstract

In this paper, we propose a novel method of blind signal separation using a new performance criterion with nonlinear separating functions, and derive an adaptive algorithm. The criterion consists of nonlinear mappings of output signals and their cumulants for independence of the signals. We show that our method requires a weak condition for stable separation through the ODE analysis. Simulation results are given to verify the validity and advantage of the proposed algorithm. 1999 Elsevier Science B.V. All rights reserved.

Zusammenfassung

In diesem Artikel schlagen wir eine neuartige Methode zur blinden Signalseparation vor, die ein neues Leistungs- kriterium basierend auf nichtlinearen Separationsfunktionen verwendet. Daraus wird ein adaptiver Algorithmus abgeleitet. Das Kriterium geht aus nichtlinearen Abbildungen der Ausgangssignale und ihrer Kumultanten zur Un- abha¨ngigkeit der Signale hervor. Durch eine auf gewo¨hnlichen Differenzialgleichungen basierende Analyse zeigen wir, da{ unsere Methode unter schwachen Voraussetzungen auf eine stabile Separation fu¨hrt. Es werden Simulations- ergebnisse vorgelegt, um die Gu¨ltigkeit und die Vorteile des vorgeschlagenen Algorithmus zu u¨berpru¨fen. 1999 Elsevier Science B.V. All rights reserved.

Re´sume´

Dans cet article, nous proposons une nouvelle me´thode de se´paration de signaux aveugle utilisant un nouveau crite`re de performance utilisant des fonctions de se´paration non-line´aires, et nous en de´rivons un algorithme adaptatif. Le crite`re consiste en des appariements non-line´aires des signaux de sortie et de leurs cumulants pour l‘inde´pendance des signaux. Nous montrons que notre me´thode demande une condition faible pour une se´paration stable au travers d’une analyse par EDO. Des re´sultats de simulations sont fournis pour ve´rifier la validite´ et l‘avantage de l’algorithme propose´. 1999 Elsevier Science B.V. All rights reserved.

Keywords: Cumulant; Blind separation; Weight adaptation

* Corresponding author. Tel.: #82 42 869 3438; fax: #82 42 869 3410; e-mail: djpark@eekaist.kaist.ac.kr

0165-1684/99/$ see front matter 1999 Elsevier Science B.V. All rights reserved. PII: S 0 1 6 5 - 1 6 8 4 ( 9 8 ) 0 0 1 9 7 - 2

268

B.-C. Ihm, D.-J. Park / Signal Processing 73 (1999) 267–276

1. Introduction

The problem of blind separation of sources arises in many signal processing applications like com- munications, array processing, speech analysis and speech recognition. In all these instances, the un- derlying assumption is that several linear mixtures of unknown, random, zero-mean, and statistically independent signals which are called sources are observed; the problem consists of recovering the original signals from their mixtures without a priori information of coefficients of the mixtures and knowledge of the sources. Mathematically, the situ- ation is described by the simple and well-known data model:

X(t)"AS(t),

t"1,2, 2 ,

(1)

where X( ) ) and S( ) ) are column vectors of sizes of n and m, respectively, and A is an n m matrix. The idea here is that the vector X( ) ) results from measurements by n sensors receiving contributions from m sources. The matrix A is called as a mixing matrix. Our task is to obtain a good estimate C of A (a pseudo-inverse when nOm) called as a sep- arating matrix and to retrieve the sources by measurement outputs:

½(t)"CX(t).

(2)

However as shown in [18], there are indetermin- acies in this problem, and A itself cannot be identified. What we can do is to obtain a rescaled and permuted version of the m source signals. That is, C"DPA where D and P are a diagonal matrix and a permutation matrix, respectively. In the recent years, since the remarkable work was introduced by Jutten and He´rault [14], various approaches have been presented in the literature [1,413,1518]. In [14], the separating matrix is updated by odd nonlinear functions, which are called separating functions later, in order to achieve the independence of the processed output signals ½( ) ). Therein, the conditions of successful separ- ation depend on characteristics of the source sig- nals and the separating functions. For example, if a cubic (i.e., x ) and a linear (i.e., x) functions are used as the separating functions, sub-Gaussian sig- nals (kurtosis (3) are separable under assump-

tions with unit variances of the sources [16,17]. The JuttenHe´rault algorithm is inspired by a neuromimetic approach; this line is further fol- lowed by Karhunen et al. [15] and Cichocki et al.

[7,8,12].

Later, Cardoso [5] and Comon [10] considered the nonadaptive source separation problem, and proposed different solutions based on the fourth- order statistics of ½( ) ). In [10], Comon defined the contrast, which was the square sum of the fourth- order cumulants, for the independence criterion. The contrast was used in other papers [11,6]. In these papers, the stability conditions can never be met if there are more than one Gaussian source signal. On the other hand, based on information theoretic ideas, Bell et al. proposed the nonlinear distortion of the output ½( ) ), i.e., ½(t)"g(X(t)) [1]. When g( ) ) has a hyperbolic tangent nonlinearity and the sources have super-Gaussian (kurtosis '3) distributions, the sources can be separable by maximizing the joint entropy of ½( ) ). In this paper, we propose a new method for signal separation that is based on a new criterion for the case of the square separating matrix A, namely, n"m. Section 2 presents this new algo- rithm derived from the criterion. Section 3 analyses the stationary points of the proposed performance criterion. Finally, Section 4 gives the results of computer experiments, and Section 5 is devoted to conclusions.

2. New performance criterion and learning algorithm

2.1. Cumulants

In the blind source separation problem, we can- not know characteristics of source signals and channels a priori, therefore we use the only assump- tion that the source signals are statistically inde- pendent. So we need to discuss the statistical independence before we proceed. If we denote f (s ) as a marginal probability density function of ran- dom variables s ,s , 2 ,s and f 2 (s ,s , 2 ,s ) as a joint probability density function. And if s ,s , 2 ,s are independent, then the joint probabil- ity density function is represented by the product of

B.-C. Ihm, D.-J. Park / Signal Processing 73 (1999) 267276

269

the marginal probability density function of each variable, i.e.,

f 2 (s ,s , 2 ,s )"

f (s ).

(3)

However, since we cannot estimate the probabil- ity density function of the signals, we use the cumu- lants as a measure of statistical independence. The cumulant is defined as follows [3]:

cum(s ,s , 2 ,s )

" (!1) (p!1)! E

s 2 E

s ,

(4)

where the summation extends over all partitions (I ,I , 2 ,I ), p"1,2, 2 ,n, of (1,2, 2 ,n). The cumu- lant has a cumulant generating function as the moment. For random signals s ,s , 2 ,s , the cumu- lant generating function is defined as follows [3]:

(v ,v , 2 ,v )"log E(e ). (5)

cum(s ,s , 2 ,s ) is given by the coefficient of (j) v v 2v in the Taylor series expansion of the above cumulant generating function about the ori- gin. If s ,s , 2 ,s are statistically independent, the cumulant generating function is represented by sum of that of each signal as follows:

(v ,v , 2 ,v )"

log E(e )

(6)

"

(v ).

(7)

If we take the Taylor series expansion at both sides of Eq. (7), the right-hand side has no cross- cumulant, though the left-hand side has all joint cross-cumulants. To satisfy the equality in the equation, all joint cross-cumulants of the left-hand side of Eq. (7) must vanish. From this point of view, we can recognize that if we make all joint cross- cumulants of signals be zero, we can guarantee the statistical independence of signals.

2.2. New performance criterion

Since, in practice, it is not possible to estimate the cross-cumulants of the whole orders, we use nonlin-

ear functions including them in their Taylor series expansions. Consider an infinitely differentiable function f ( ) ) of s . The Taylor series expansion of f(s ) about the origin is of the following form:

f(s )"f (0)#f (0)s # f 2! (0) s

f (0) i!

a s .

"

s

# f (0)

3!

s

#2

(8)

(9)

(10)

Similarly, we can expand an infinitely differenti-

able function g( ) ) with respect to s around the

origin to obtain g(s )"

expectation of f (s )g(s ), we have

b s . If we take the

E f(s )g(s ) "E

a

b s

s

"

a

b E(s s

).

(11)

(12)

Product of the expectation of f (s ) and that of g(s ) is obtained as

E f(s ) E g(s ) "E

a

E

s

b

s

Therefore,

"

a b E(s )E(s ).

(13)

(14)

cum( f (s ),g(s ))"E f (s )g(s ) !E f(s ) E g(s )

"

a b cum(s ,s ).

(15)

If s and s are independent, cum(s ,s ) is zero for all i,j and these make cum( f (s ),g(s )) be zero. Therefore cum( f (s ),g(s ))"0 is a necessary condi- tion in order for those s and s to be independent. For independent random signals s ,s , 2 ,s , ele- ments y ,y , 2 ,y of ½"CX"CASS, where ¼"CA, satisfy the following theorem [9].

270

B.-C. Ihm, D.-J. Park / Signal Processing 73 (1999) 267–276

Theorem 1. ¹he following three statements are equivalent:

1. y ,y , 2 ,y are pair-wisely independent.

2. y ,y , 2 ,y are mutually independent.

3. ¼"DP, where D is a diagonal matrix and P is a permutation matrix.

If we make all pairs of n signals be independent, then the signals become mutually independent. Based on this and the above necessary condition, we define a new performance criterion [13]. From now on we abbreviate f (y ) and g(y ) as f and g , respectively, for convenience’s sake.

Definition 1. A new performance criterion for sep- aration of sources is defined as

J" M!diag M ,

(16)

where

M" cum( f ,g )

cum( f ,g )

cum( f ,g )

cum( f ,g )

2

2

2

cum( f ,g )

cum( f ,g )

cum( f ,g )

,

cum( f ,g )

cum( f ,g )

(17)

diag M" M

0

0

0

M

0

M "

M

,

2

2

2

0

,

0

M

where ) is the Frobenious norm.

2.3. Learning algorithm

We reformulate the problem as that of minimiz- ing the performance criterion and will discuss the learning of a separator matrix in order to get a de- sirable value. Let the separator matrix C be as follows:

(18)

½"CX,

where

C"

c

1

c

c

c

1

2

2

2

c

c

1

.

(19)

In order to minimize the performance criterion, the separator matrix C is to be updated by using the steepest descent method as follows:

C(k#1)"C(k)# C(k),

(20)

C(k)"!

J

C(k) .

(21)

The performance criterion J is represented by the sum of squares of elements in M except the diag- onal terms as follows:

J"

(cum( f ,g ))

"

(E(f g )!E(f )E(g )) .

(22)

By differentiating J partially about an element c , we can obtain the following equation:

J

c

"2

(E(f g )!E(f )E(g ))

) (E( f x g )!E( f x )E(g ))

#2

(E(g f )!E(g )E(f ))

) (E(g x f )!E(g x )E(f ))

"2

[cum( f ,g ) ) cum( f x ,g )

#cum(g ,f ) ) cum(g x ,f )].

(23)

That is, the partial derivatives can be defined as

J

H (C,X,½).

c

Example 1. Let the matrices and functions be

A"

1

0.5

0.5

1

,

C"

1

c

c

1

,

(24)

B.-C. Ihm, D.-J. Park / Signal Processing 73 (1999) 267276

271

f (x)"tanh(x) and g(x)"tanh(0.5x). s and s are random signals whose probability density functions are uniformly distributed on [!1,1]. Fig. 1 shows the shape of the performance cri- terion J about the axes c and c and trajectories of those elements which have different initial points are given in Fig. 2. In this example, equilibria of (c ,c ) are (!0.5,!0.5) and (!2,!2) as shown in the figure. We can know that after sufficient learning, (c ,c ) goes to one of the equilibria. Fig. 3 illustrates the learning result of CA. In this experiment, we initialize C by the identity matrix I.

2.4. Phantom solutions

If the separator matrix C converges and stays in the steady state, then C(k) becomes 0 to make cum( f ,g )"0 for iOj. However since cum( f ,g )"0 is a necessary condition for indepen- dence of y and y , the matrix C may have phantom solutions. Phantom solutions can make the perfor- mance criterion J be zero but can not separate the measured signals to recover the source signals. For example, assume n"2 and f (x)"g(x)"e !1. After sufficient learning, if cum( f ,g )"0 is made, then E(e )"E(e )E(e ) is established. If we denote ¼"CA, then outputs y and y become w s #w s and w s #w s , re- spectively, which leads to E(e )E(e )

s , re- spectively, which leads to E ( e ) E ( e ) Fig.

Fig. 2. Trajectories of elements of C.

( e ) E ( e ) Fig. 2. Trajectories of elements of C . Fig.
Fig. 3. Variations of ¼"CA. "E(e )E(e )E(e )E(e ) to result in the following
Fig. 3. Variations of ¼"CA.
"E(e )E(e )E(e )E(e ) to result in the
following solutions:
0
0
¼"
0
or
(25)
0
,
.
Fig. 1. Surface of performance criterion J.
The first two of the above solutions are desirable to
recover the source signals, but the last one cannot
separate the measured signals, though it makes the
performance criterion J have a minimum value.
The latter case happens in case of source signals
whose probability density functions are impulsive.

272

B.-C. Ihm, D.-J. Park / Signal Processing 73 (1999) 267–276

When more than one Gaussian source exist, we may obtain extra solutions that make the cost func- tion be minimum and the outputs be independent. The proposed cost function has the following phys- ical meaning for independence:

cum(y ,y )"0, iOj, i,j"1,2, 2 ,n, p,q"1,2,3, 2 .

(26)

In case of Gaussian sources, it is sufficient that cum(y ,y )"0, iOj. ½" y ,y , 2 ,y has ele- ments which consist of linear combinations of Gaussian signals, therefore the components are also Gaussian. In this case, if R "E(½½ ) is a di- agonal matrix, y , i"1,2, 2 ,m, are independent. R "CAR A C where, without loss of generality, we can assume that R "I is a diagonal matrix and this leads to CA"DG where G is a general ortho- gonal matrix [4]. This means that by decorrelating the output, we can obtain the independent compo- nents from the mixture of Gaussian sources, how- ever they may not be the same as the original sources. Only if G is a permutation matrix, the sources can be separated.

3. Stability analysis

The ordinary differential equation (ODE) method is useful to study the asymptotic behaviour of an algorithm such as

(27)

where is a stationary sequence of random vari- ables, and is a sequence of positive numbers [2]. Since our algorithm has the above form, that is, we can replace , and by C, H and X,½ , respec- tively, we investigate the stability using the ODE method in the case of n"m"2. A stationary point H verifies E ( H , )"0 and is said to be stable if all the eigenvalues of a matrix defined as

" ! ( , ),

" E ( , )

have positive real parts. Since the separating points are equilibria of algo- rithm (20), H is one of the separating points of C.

(28)

H

For n"m"2, there are two separating points of C. The first point is called a natural separating point C because it restores s and s in the natural order. The second is a reverse separating point C implying the reverse order. If we denote "[c ,c ] , then from Eqs. (20) and (23) at C can be described by

J

(C )"

"

J

,

c J

c c

.

c c J c

i"1,2,

(29)

(30)

For stability, (C ) needs to have eigenvalues which have positive real parts, and this can be checked easily. ( (C ))s are roots of the following polynomial equation:

(31)

We obtain two eigenvalues

!( # ) # ! "0.

" ( # )$ ( ! ) #4

2

.

To satisfy the positivity of real parts, the following two conditions are sufficient:

# '0,

(32)

! '0.

(33)

# ' 0, (32) ! ' 0. (33) Fig. 4. Variations of ¼" CA in case

Fig. 4. Variations of ¼"CA in case of mixture of sub- and super-Gaussian signals.

B.-C. Ihm, D.-J. Park / Signal Processing 73 (1999) 267276

273

D.-J. Park / Signal Processing 73 (1999) 267 — 276 273 Fig. 5. Separation result of

Fig. 5. Separation result of train horn sound and bird chirp sound.

274

B.-C. Ihm, D.-J. Park / Signal Processing 73 (1999) 267–276

The first condition is satisfied since C s are local minima. In the natural separating point C , the outputs y and y are scaled s and s , respectively, i.e., y " s for i"1,2 for some scaling factors . From this fact and the zero-mean assump- tions, , and are written as

"2a [E ( f )E (g s )#E (g )E (f s )],

(34)

"2a a [E(f s )E(g s )E( f )E(g )

#E(g s )E(f s )E(g )E( f )],

(35)

"2a [E ( f )E (g s )#E (g )E (f s )],

(36)

where f and g denote f ( s ) and g( s ), respec- tively, for i"1,2. Through some manipulations, we can obtain the following relation:

! "a

a (º ) *0,

(37)

where

E(g g )E(f s f s ). Similarly, for the reverse separating point C , we can obtain with y " s ,y " s ,

º "E( f )E(g s g s )

f

and

» "

! "a

a (º ) *0,

(38)

where

º "E( f f )E(g s g s )

E(g g )E(f s f s ). Therefore, the algorithm is lo-

cally stable in the vicinity of the separating points unless º such as f ( ) )" g( ) ) where is con- stant. Some separating functions deserve comments. If f (x)"x and g(x)"x as many algorithms, º "

9 E (s )E (s ) and » " E(s )E(s ).

This

9E (s )E (s ) for stability. This relation holds for the reverse separating point C . Therefore, when we use a cubic and a linear function for the separating functions, stable separation can never be met if there are more than one Gaussian sources. Note that our condition is weaker than requirements of other researches [1,6,16,17], where the stable con- ditions are described in terms of inequality. If we employ the different nonlinear functions as the sep- arating functions, Eq. (33) holds in the neighbour- hood of C and C . Therefore, the algorithm with the initial points near C or C converges to C or C , respectively. We left the stability analysis of the

and

» "

leads

to

the

condition

E(s )E(s )O

higher-order cases for n,m'2 for further works to the readers.

4. Computer simulation results

In the first computer simulation, we applied the algorithm to mixtures of a sub-Gaussian signal and a super-Gaussian signal with the mixing matrix

A"

1

0.72

0.31

1

.

(39)

The sub-Gaussian signal ( s ) is uniformly dis-

tributed

on

[!1,1]

( "E(s )!3E (s )"

!0.8(0) and the super-Gaussian signal ( s )

is exponentially distributed with a parameter

)!3E (s )"6'0). The separating and g(x) are tanh(x), tanh(0.5x), re-

1 function ( "E(s f (x)

spectively. The step-size used for the algorithm is 0.08 and C(0), the initial value of C is I.

In Fig. 4, we present variations of the product of C and A. For successful separation, CA has a form of product of a diagonal matrix D and a permuta- tion matrix P and from the result, we can know that CA becomes that kind of form. This example is not

solvable by the previous algorithms since they have the restriction on these signals, e.g., some tech- niques are applicable to either the sub- or super- Gaussian signals and some methods are suitable to only signals whose sum of kurtosis have negative

and some methods are suitable to only signals whose sum of kurtosis have negative Fig. 6.

Fig. 6. Variations of ¼"CA.

B.-C. Ihm, D.-J. Park / Signal Processing 73 (1999) 267276

275

value. Fig. 4 proves the validity of the proposed algorithm. In the second and the third experiments, the algorithm is employed to the mixtures of two sam- pled signals and three sampled signals, respectively. The mixing matrix A has 1 as diagonal elements and 0.5 as off-diagonal components and the other settings ( f (x), g(x), and C(0)) are the same as in the first simulation.

Fig. 5 shows the successful separation result of a train horn sound and a bird chirp sound. Fig. 6 represents that the algorithm makes the sep- arating matrix C be trained such that CA has the negligible off-diagonal elements in comparison with the diagonal components. Figs. 7 and 8 depict the separation results of three arbitrary source signals and learning trajecto- ries of elements of the matrix CA.

signals and learning trajecto- ries of elements of the matrix CA . Fig. 7. Separation result

Fig. 7. Separation result of three source signals.

276

B.-C. Ihm, D.-J. Park / Signal Processing 73 (1999) 267–276

Ihm, D.-J. Park / Signal Processing 73 (1999) 267–276 Fig. 8. Variations of ¼" CA with

Fig. 8. Variations of ¼"CA with three source signals.

5. Conclusions

In this paper, we proposed the new blind separ- ation algorithm in order to apply it to the various source signals. It seems to be so difficult to solve this problem since the characteristics of the sources and channels are not known. The assumption we have made is only that the source signals are mutually independent and the channel modeling matrix is nonsingular. So far, many researchers have shown the remarkable results on this problem, but their results had limitations that they were not applicable to some sorts of signals. For example, either sub- or super-Gaussian sour- ces can be separable. However, we used the cumu- lants and the nonlinear functions so that the algorithm based on the newly proposed cost func- tion had little restriction to various kinds of source signals. As shown in the computer simulations, the mixtures of the arbitrary signals could be separated successfully and especially the case which could not be solved by the previous researches have been resolved. In the future research, the study on the bound of the step size and the steady-state error between CA and DP will be performed. Improvement of the learning speed are currently under investigation and we are expecting further research results on the cases of dependent source signals, nonlinear chan- nels and additive noises.

References

[1] A.J. Bell, T. Sejnowski, Blind separation and blind decon- volution: An information-theoretic approach, in: Proc. In- ternat. Conf. Acoust. Speech Signal Process., Detroit, 1995, pp. 34153418. [2] A. Benveniste, M. Me´tivier, P. Priouret, Adaptive Algo- rithms and Stochastic Approximations, Springer, Berlin,

1990.

[3] D.R. Brillinger, Time Series: Data Analysis and Theory, Holden-Day, San Francisco, CA, 1981. [4] X.-R. Cao, R.-W. Liu, General approach to blind source separation, IEEE Trans. Signal Process. 44 (3) (March 1996) 562571. [5] J.F. Cardoso, Source separation using higher-order mo- ment, in: Proc. Internat. Conf. Acoust. Speech Signal Pro- cessing, Glasgow, May 1989, pp. 21092112. [6] J.-F. Cardoso, B.H. Laheld, Equivariant adaptive source separation, IEEE Trans. Signal Process. 44 (12) (December 1996) 30173030. [7] A. Cichocki, L. Moszczynski, New learning algorithm for blind separation of sources, Electron. Lett. 28 (21) (Octo- ber 1992) 19861987. [8] A. Cichocki, R. Unbehauen, E. Rummert, Robust learning algorithm for blind separation of signals, Electron. Lett. 30 (17) (August 1994) 13861387. [9] P. Comon, Higher-order separation, application to detection and localization, in: Signal Processing V, Theory and Applications, Proc. Eusipco-90, Barcelona, September 1990, pp. 277280. [10] P. Comon, Independent component analysis, A new con- cept?, Signal Processing 36 (3) (April 1994) 287314. [11] N. Delfosse, P. Loubaton, Adaptive blind separation of independent sources: A deflation approach, Signal Pro- cessing 45 (1995) 5983. [12] S.C. Douglas, A. Cichocki, Neural networks for blind decorrelation of signals, IEEE Trans. Signal Process. 45 (11) (November 1997) 28292842. [13] B.C. Ihm, K.-S. Eom, D.J. Park, New blind separation algorithm using cumulant measure, in: Proc. 3rd Internat. Conf. Signal Processing, Beijing, China, October 1996, pp.

477480.

[14] C. Jutten J. He´rault, Blind separation of sources, Part I: An adaptive algorithm based on neuromimetic architecture, Signal Process. 24 (1) (July 1991) 110. [15] J. Karhunen, J. Joutsensalo, Representation and separ- ation of signals using nonlinear PCA type learning, Neural Networks 7 (1) (1993) 113127. [16] O. Macchi, E. Moreau, Self-adaptive source separation, part I: convergence analysis of a direct linear network controlled by the He´raultJutten algorithm, IEEE Trans. Signal Process. 45 (4) (April 1997) 918926. [17] E. Sorouchyari, Blind separation of sources, Part III: Stab- ility analysis, Signal Processing 24 (1) (July 1991) 2129. [18] L. Tong, R.-W. Liu, V.C. Soon, Y.-F. Huang, Indetermin- acy and identifiability of blind identification, IEEE Trans. Circuits and Systems 38 (5) (May 1991) 499509.