Sie sind auf Seite 1von 10

Expert Systems with Applications 38 (2011) 1000010009

Contents lists available at ScienceDirect

Expert Systems with Applications


journal homepage: www.elsevier.com/locate/eswa

A new feature extraction and selection scheme for hybrid fault diagnosis of gearbox
Bing Li a,b,, Pei-lin Zhang a, Hao Tian a, Shuang-shan Mi b, Dong-sheng Liu b, Guo-quan Ren a
a b

First Department, Mechanical Engineering College, No. 97, Hepingxilu Road, Shi Jia-zhuang, He Bei province, PR China Forth Department, Mechanical Engineering College, No. 97, Hepingxilu Road, Shi Jia-zhuang, He Bei province, PR China

a r t i c l e

i n f o

a b s t r a c t
A novel feature extraction and selection scheme was proposed for hybrid fault diagnosis of gearbox based on S transform, non-negative matrix factorization (NMF), mutual information and multi-objective evolutionary algorithms. Timefrequency distributions of vibration signals, acquired from gearbox with different fault states, were obtained by S transform. Then non-negative matrix factorization (NMF) was employed to extract features from the timefrequency representations. Furthermore, a two stage feature selection approach combining lter and wrapper techniques based on mutual information and non-dominated sorting genetic algorithms II (NSGA-II) was presented to get a more compact feature subset for accurate classication of hybrid faults of gearbox. Eight fault states, including gear defects, bearing defects and combination of gear and bearing defects, were simulated on a single-stage gearbox to evaluated the proposed feature extraction and selection scheme. Four different classiers were employed to incorporate with the presented techniques for classication. Performances of four classiers with different feature subsets were compared. Results of the experiments have revealed that the proposed feature extraction and selection scheme demonstrate to be an effective and efcient tool for hybrid fault diagnosis of gearbox. 2011 Elsevier Ltd. All rights reserved.

Keywords: Gearbox Hybrid fault diagnosis Feature extraction Feature selection S transform Non-negative matrix factorization (NMF) Mutual information Non-dominated sorting genetic algorithms II (NSGA-II)

1. Introduction Gearbox is one of the core components in rotating machinery and has been widely employed in various industrial equipments. Faults occurring in gearbox such as gears and bearings defects must be detected as early as possible to avoid fatal breakdowns of machines and prevent loss of production and human casualties. Vibration based analysis is the most commonly used technique to monitoring the condition of gearboxes. By employing appropriate data analysis algorithms, it is feasible to detect changes in vibration signals caused by fault components, and to make decisions about the gearboxes health status (Al-Ghamd & Mba, 2006; Chen, He, Chu, & Huang, 2003; Lin & Zuo, 2003; Saravanan, Cholairajan, & Ramachandran, 2008; Wang & McFadden, 1993). Although often the visual inspection of the frequency domain features of the measured signals is adequate to identify the faults, many techniques available presently require a good deal of expertise to apply them successfully. Simpler approaches are needed which allow relatively unskilled operators to make reliable decisions without the need for a diagnosis specialist to examine data and diagnose problems. Consequently, there is a need for a reliable, fast and automated procedure of diagnostics. Various intelligent techniques such as articial neural networks (ANN), support vector machine (SVM), fuzzy logic
Corresponding author at: First Department, Mechanical Engineering College, No. 97, Hepingxilu Road, Shi Jia-zhuang, He Bei province, PR China. E-mail address: rommandy@163.com (B. Li).
0957-4174/$ - see front matter 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2011.02.008

and evolving algorithms (EA) have been successfully applied to automated detection and diagnosis of machine conditions (Firpi & Vachtsevanos, 2008; Lei, He, Zi, & Hu, 2007; Lei & Zuo, 2009; Samanta, 2004; Samanta, Al-Balushi, & Al-Araimi, 2003; Samanta & Nataraj, 2009; Srinivasan, Cheu, Poh, & Ng, 2000; Wuxing, Tse, Guicai, & Tielin, 2004). They have largely improved the reliability and automation of fault diagnosis systems for gearbox. For intelligent fault diagnosis systems, feature extraction and feature selection schemes can be regarded as the most two important steps. Feature extraction is a mapping process from the measured signal space to the feature space. Representative features associated with the conditions of machinery components should be extracted by using appropriate signal processing and calculating approaches. Over the past few years, various techniques including Fourier transform (FT), envelope analysis, wavelet analysis, empirical mode decomposition (EMD) and timefrequency distributions were employed to processing the vibration signals (Lin & Qu, 2000; Oehlmann, Brie, Tomczak, & Richard, 1997; Peng, Tse, & Chu, 2005; Randall, Antoni, & Chobsaard, 2001; Wang & McFadden, 1993). Based on these processing techniques, statistic calculation methods, autoregressive model (AR), singular value decomposition (SVD), principal component analysis (PCA) and independent component analysis (ICA) have been adopted to extracting representative features for machinery fault diagnosis (Junsheng, Dejie, & Yu, 2006; Lei, He, Zi, & Chen, 2008; Li, Shi, Liao, & Yang, 2003; Wang, Luo, Qin, Leng, & Wang, 2008; Widodo, Yang, & Han, 2007). Even though several techniques have been proposed in the literature

B. Li et al. / Expert Systems with Applications 38 (2011) 1000010009

10001

for feature extraction, it still remains a challenge in implementing a diagnostic tool for real-world monitoring applications because of the complexity of machinery structures and operating conditions. This investigation implement the feature extraction scheme by utilizing two newly developed techniques, the S transform and non-negative matrix factorization (NMF). The S transform introduced by Stockwell (Stockwell, Mansinha, & Lowe, 1996), which combines the separate strengths of the STFT and wavelet transforms, has provided an alternative approach to process the nonstationary signals generated by mechanical systems. It employs variable window length. The frequency-dependent window function produces higher frequency resolution at lower frequencies, while at higher frequencies, sharper time localization can be achieved. S transform has become a valuable tool for the analysis of signals in many applications (Assous, Humeau, Tartas, Abraham, & LHuillier, 2006; Dash & Chilukuri, 2004; Dash, Panigrahi, & Panda, 2003). Non-negative matrix factorization (NMF) is a new subspace decomposition technique which is proposed to extract key features of data by operating an iterative matrix factorization (Lee & Seung, 1999). Differs from other similar methods, NMF imposes a non-negativity constraint on factorization which leads to form intuitive and parts based representations of data on its factors. Due to its superior property, NMF has been adopted to various applications such as face expression and recognition (Pu, Yi, Zheng, Zhou, & Ye, 2005), object detection (Liu & Zheng, 2004), image compression and classication (Yuan & Oja, 2005), sounds classication (Cho & Choi, 2005), etc. In this paper, a novel feature extraction scheme based on S transform and non-negative matrix factorization (NMF) for hybrid fault diagnosis of gearbox is presented. Feature selection is another indispensable procedure for intelligent fault diagnosis system, since there are still high noises, irrelevant or redundant information in these extracted features. Otherwise, too many features can cause of curse of dimensionality due to the fact that the number of training samples must grow exponentially with the number of features in order to learn an accurate model. Therefore, a feature selection procedure is indeed needed before classication. Several researches have been done on this issue, such as genetic algorithms (GAs) (Jack & Nandi, 2002; Jack, Nandi, & McCormick, 2000; Samanta, 2004), decision tree (Sugumaran, Muralidharan, & Ramachandran, 2007) and distance evaluation technique (Lei et al., 2008; Lei & Zuo, 2009). Filters and wrappers methods are the two mainly categories of feature selection algorithms. Filter methods evaluate the goodness of the feature subset by using the intrinsic characteristic of the data. They are relatively computationally cheap since they do not involve the induction algorithm. However, they also take the risk of selecting subsets of features that may not match the chosen induction algorithm. Wrapper methods, on the contrary, directly use the classiers to evaluate the feature subsets. They generally outperform lter methods in terms of prediction accuracy, but they are generally computationally more intensive (Zhu, Ong, & Dash, 2007). In summary, wrapper and lter methods can complement each other, in that lter methods can search through the feature space efciently while the wrappers can provide good accuracy. It is desirable to combine the lter and wrapper methods to achieve high efciency and accuracy simultaneously. In this work, a two stage feature selection combing lter and wrapper selection technique base on the mutual information and the improved nondominated sort genetic algorithm NSGA-II was proposed. In the rst stage, a candidate feature subset was chosen by the max-relevance and min-redundancy (mRMR) criterion (Peng, Long, & Ding, 2005) from the original feature set. Then at the second stage, classier combined with NSGA-II was adopted to nd a more compact feature subset from the candidate feature subset obtained by lter methods. In this stage, Feature selection problem is dened as a

Gearbox with sensors Data acquisition Signal acquisition system Vibration signals

S transform Time-frequency distributions NMF Feature extraction

Original feature subset Filter methods based on mutual information Candidate feature subset Wrapper methods based on NSGA- II Feature selection

Final optimal feature subset Classifiers outputs Classification Gearbox condition diagnosis

Fig. 1. Flowchart of the presented intelligent fault diagnosis system.

multi-objective problem dealing with two competing objectives. Consequently, an optimal feature subset consists of a minimal number of features and produces the minimum classication error can be obtained for intelligent fault diagnosis. Fig. 1 displays the owchart of the intelligent fault diagnosis system for gearbox in this investigation. The rest of this investigation is organized as follows. Section 2 describes the experiment setup and vibration dataset. The feature extraction method based on S transform and NMF is detailed in Section 3. The two stage feature selection scheme based on mutual information and NSGA-II is outlined in Section 4. Section 5 presents the experiment results and discussions. Conclusions are summarized in Section 6. 2. Experimental setup and data acquisition Fig. 2 displays the diagram of the experimental system used in this work to evaluate the performance of the proposed approach. The experiment system includes a single-stage gearbox, an ac motor and a magnetic brake. The ac motor is used to drive the gearbox

B1

Gear A: 30T

B2

Motor

Load
B4 B3

Gear B: 50T

Fig. 2. Structure of the experiment gearbox.

10002 Table 1 Description of the gearbox experiments. Fault states 1 2 3 4 5 6 7 8 Fault description

B. Li et al. / Expert Systems with Applications 38 (2011) 1000010009

Ss; f
Load (N m) 100 Speed (rpm) 1200

xt wt sej2pft dt

where
2 1 t wt p e 2r2 r 2p

Normal Gear wear Bearing inner race Bearing outer race Gear wear and bearing inner race Gear wear and bearing outer race Bearing inner race and bearing outer race Gear wear and bearing inner race and bearing outer race

And

1 jf j Z
1
t s2 f 2 jf j xt p e 2 ej2pft dt 2p

Then the S transform can be given by combining the Eq. (1)(3):

Ss; f
and the rotating speed is controlled by a speed controller, which allows the tested gearbox to operate under various speeds. The load is provided by the magnetic brake connected to the output shaft and the torque can be adjusted by a brake controller. There are two shafts inside the gearbox, which are mounted to the gearbox housing by four rolling element bearings. Gear A has 30 teeth and gear B has 50 teeth. Local faults such as gear tooth wear, bearing inner race defect, bearing outer race defect are most frequently detected fault mode studied in gearbox fault diagnosis. Previous studies mainly focused on gear faults or bearing faults independently. Little research has been done to detect the hybrid faults of gears and bearings occur simultaneously. This work investigated this issue by simulating hybrid fault types of gears and bearings simultaneously in a single-stage gearbox. Eight fault states including gear wear, bearing inner race defect, bearing outer race defect and their combinations were tested in the experiments. As shown in Fig. 2, all the defects were set on gear A and bearing B1. The detailed descriptions about the eight operating states are summarized in Table 1. Vibration signals were acquired by using acceleration sensors mounted on the four bearing bases. The sampling frequency is 6400 Hz and sampling points is 4096. Twenty samples for every state were acquired in the tests. Hence, 160 samples in total were collected for further investigations. Fig. 3 demonstrates the waveforms of the eight operating states in time domain. 3. Feature extraction based on S transform and non-negative matrix factorization (NMF) In this section, the feature extraction scheme based on the two newly developed techniques, mean the S transform and non-negative matrix factorization (NMF), was presented. Vibration signals acquired from gearbox are non-stationary and complex, which contains enrich information about the operation conditions. Thus the joint timefrequency analysis is often adopted to describe the local information of these unstable signals completely. The S transform, which combines the separate strengths of the STFT and wavelet transforms, was employed to obtain a high resolution timefrequency representations of vibration signals. However, it is not reasonable to classify these timefrequency distributions directly because of the data dimensions are too high to deal with. Thus the NMF technique was utilized to reduce the high dimensional feature space while the most information of the original timefrequency representations can be reserved. 3.1. S transform The S transform, put forward by Stockwell in 1996, can be regarded as an extension to the ideas of the Gabor transform and the wavelet transform. The S transform of signal x(t) is dened as:

Since S transform is a representation of the local spectra, Fourier or time average spectrum can be directly obtained by averaging the local spectrums as:

Ss; f ds X f

where X(f) is the Fourier transform of x(t).The inverse S transform is given by

xt

Ss; f ej2pft dsdf

The main advantage of the S transform over the short-time Fourier transform (STFT) is that the standard deviation r is actually a function of frequency f. Consequently, the window function is also a function of time and frequency. As the width of the window is controlled by the frequency, it can obviously be seen that the window is wider in the time domain at lower frequencies, and narrower at higher frequencies. In other words, the window provides good localization in the frequency domain for low frequencies while providing good localization in time domain for higher frequencies. It is a very desirable characteristic for accurate representation of nonstationary vibration signals in timefrequency domain. 3.2. Non-negative matrix factorization (NMF) The NMF algorithm is a technique that compresses a matrix into a smaller number of basis functions and their encodings (Lee & Seung, 1999). The factorization can be expressed as following:

V nm  W nr Hrm

where V denotes a n m matrix and m is the number of examples in the dataset, each column of which contains an n-dimensional observed data vector with non-negative values. This matrix then approximately factorized into a n r matrix W and a r m matrix H. The rank r of the factorization is usually chosen such that (n + m)r < nm, and hence the compression or dimensionality reduction is achieved. This results in a compressed version of the original data matrix. In other words, each data vector V is approximated by a linear combination of the columns of W, weighted by the components of H. Therefore, W can be regarded as basis matrix and H as coefcient matrix. The key characteristic of NMF is the non-negativity constraints imposed on the two factors, and the non-negativity constraints are compatible with the intuitive notion of combining parts to form a whole. In order to complete approximate factorization in Eq (7), a cost function is needed to quantify the quality of the approximation. NMF uses the divergence measure as the objective function:

n X m h X i 1 j1

V ij logWHij WHij

which subjects to the non-negative constraints as described above.

B. Li et al. / Expert Systems with Applications 38 (2011) 1000010009

10003

a
Amplitude [v]

2 1 0 -1 -2 0 0.1 0.2 0.3 Time [s] 0.4 0.5 0.6

b
Amplitude [v]

2 1 0 -1 -2 0 0.1 0.2 0.3 0.4 0.5 0.6 Time [s]

c
Amplitude [v]

2 1 0 -1 -2 0 0.1 0.2 0.3 Time [s] 0.4 0.5 0.6

d
Amplitude [v]

2 1 0 -1 -2 0 0.1 0.2 0.3 Time [s] 0.4 0.5 0.6

e
Amplitude [v]

2 1 0 -1 -2 0 0.1 0.2 0.3 Time [s] 0.4 0.5 0.6

f
Amplitude [v]

2 1 0 -1 -2 0 0.1 0.2 0.3 Time [s] 0.4 0.5 0.6

g
Amplitude [v]

2 1 0 -1 -2 0 0.1 0.2 0.3 Time [s] 0.4 0.5 0.6

h
Amplitude [v]

2 1 0 -1 -2 0 0.1 0.2 0.3 Time [s] 0.4 0.5 0.6

Fig. 3. Vibration signals acquired from eight states of gearbox in the experiments: (a)(h) corresponds to 18 states in Table 1.

In order to obtain W and H, a multiplicative update rule is given in (Lee & Seung, 1999) as follows:
m X j 1

4. Two stage feature selection based on the mutual information and NSGA-II As described in Section 3, for every signal f(n), we can get numbers of features via the S transform and NMF. Even a dimension reduction has been done by the NMF, direct manipulation of a whole set of feature components is not appropriate because the feature space has still high dimensionality, and the existence of irrelevant and redundant components makes the classication unnecessarily difcult. Thus, a feature selection scheme combine lter method and wrapper method based on mutual information and NSGA-II is applied to identify a set of robust features that provides the most discrimination among the classes of gearbox vibration data. This will signicantly ease the design of the classier and enhance the generalization capability of the system.

W ia W ia

V ij Haj WHij

W ia W ia Pm i1 W ia Haj Haj
n X i 1

10 V ij WHij

W ia

11

In this way, the basis matrix Wand the coefcient matrix H can be obtained based on Eq. (9)(11) in an iterative procedure.

10004

B. Li et al. / Expert Systems with Applications 38 (2011) 1000010009

4.1. Filter method using max-relevance and min-redundancy (mRMR) based on mutual information 4.1.1. Mutual information Mutual information is one of the most widely used measures to dene relevancy of variables (Peng et al., 2005). In this section, we focus on feature selection method based on mutual information. Given two random variables x and y, their mutual information can be dened in terms of their probabilistic density functions p(x), p(y) and p(x,y):

the selected subsets. Suppose that we already have Sm1, means we have selected m 1 features. The next work is to select the mth feature from the set fF Sm1 g. This is done according to the following criterion:

"
xj 2F Sm1

max

X 1 I xj ; c I xj ; xi m 1 x 2S
i m1

# 17

The main steps can be represented as: Step 1: Let F to be the original feature set, S to be the selected subset. We initiate S to be a empty subset, S ? {}. Step 2: Calculate the relevance of individual feature xi with the target class c, denoted by I(xi, c). Step 3: Find the feature xk have the maximum relevance:

I x; y

Z Z

px; y log

px; y dx dy pxpy

12

The estimation of the mutual information of two variables was detailed in (Peng et al., 2005). In supervised classication, one can view the classes as a variable (that we will name C) with L possible values (where L is the number of classes of the system) and the feature component as another variable (that we will name X) with K possible values (where K is the number of parameters of the system). So, one will be able to compute the mutual information I(xk, c) between the classes c and the feature xk (k = 1, 2, . . ., K):

Ixk ; c max Ixi ; c


xi 2F

Let F 1 ! fF xk g;
Step 4:

S1 ! fS xk g:

I xk ; c

Z Z

pxk ; c log

pxk ; c dxk dc pxk pc

13

for m 2 : N
Let xj 2 F m1 , xi 2 Sm1 , nd the xk according to the following criterion:

Then the informative variables with larger I(xk, c) can be identied. A more compact feature subset can be obtained via selecting the d best features based on Eq. (13) from the original feature set. Eq. (13) provides us with a measure to evaluate the effectiveness of the global feature that is simultaneously suitable to differentiate all classes of signals. For a small number of classes, this approach may be sufcient. The more signal classes, the more ambiguous I(xk, c) becomes. 4.1.2. Max-relevance and min-redundancy Max-relevance means that the selected features xi are required, individually, to have the largest mutual information I(xi, c). It means that the m best individual features should be selected according to this criterion. It can be represented as

X 1 max Ixj ; c Ixj ; xi xj 2F m1 m 1 x 2S


i m1

"

Let F m fF m1 xk g;

Sm fSm1 xk g

max DS; c;

1 X Ixi ; c j Sj x 2 S

14

end In this way, N sequential feature subsets can be obtained and satisfy S1  S2   SN. The next problem is how to choose an optimal set from the serial sets, in other words, how to determinate the number of features that the sub-optimal set contained. According to the cross-validation method, we compare the performances of the N sequential feature subsets. The feature subset that corresponds to best performance can be chosen as the candidate feature subset for further more sophisticated selection using wrapper. 4.2. Wrapper method based on NSGA-II Genetic algorithms were intensively used for feature selection to solve the combinatory problem and to provide efcient exploration of the solutions space. GAs work with a set of candidate solutions called a population. Based on the Darwinian principle of survival of the ttest, GAs obtain the optimal solution after a series of iterative computations. GAs generate successive populations of alternate solutions that are represented by a chromosome, i.e. a solution to the problem, until acceptable results are obtained. Associated with the characteristics of exploitation and exploration search, GAs can deal with large search spaces efciently, and hence has less chance to get local optimal solution than other algorithms. GAs were specically developed for this task and results provided by GAs solution were more efcient than classical methods developed for feature selection as conrmed in (Oh, Lee, & Moon, 2004; Tan, Fu, Zhang, & Bourgeois, 2008; Yang & Honavar, 1998; Zhu & Guan, 2004). These researches mainly used the single objective optimization GAs which provide one optimal solution with the maximum classication performance. Feature selection problem can be dened as a multi-objective problem dealing with two competing objectives. Consequently, an optimal feature set has to be of a minimal number of features and have to produce the minimum classication error.

where |S| denotes the number of features contained by S. However, it has been proved that the simply combination of the best individual features do not necessarily lead to a good performance. In other words, the m best features are not the best m features (Kohavi & John, 1997; Peng et al., 2005). The most important problem of the max-relevance is it neglects the redundancy between features and may cause the degradation of the classication performance. So the min-redundancy criterion should be added to the selection of the optimal subsets. It can be represented as:

min RS;

1 jSj
2

X
xi ;xj 2 S

I xi ; xj

15

The criterion combining the above two constraints is called the minimal-redundancymaximal-relevance (mRMR) (H. Peng et al., 2005). The operator U(D, R) is dened to optimize D and R simultaneously:

max UD; R;

UDR

16

4.1.3. Candidate feature subset obtained based on max-relevance and min-redundancy In practice, greedy search methods can be used to nd the nearoptimal features by U. Let F to be the original feature sets, S to be

B. Li et al. / Expert Systems with Applications 38 (2011) 1000010009

10005

Non-dominated sorting genetic algorithm (NSGA) was suggested by Goldberg and implemented by Srinivas and Deb (1994). Although NSGA has been proved to be effective for multiobjective optimization problems, NSGA has its drawbacks such as high computational complexity of non-dominated sorting, lack of elitism, and need for specifying the sharing parameter rshare. Aim at such problems, Deb introduced NSGA-II (Deb, Pratap, Agarwal, & Meyarivan, 2002) as an improved method which overcame the original NSGA defects by alleviating computational complexity, by introducing elitist-preserving mechanism and employing crowded comparison operator. In this work, this new multi-objective optimization technique was employed to solve the problem of feature selection with wrapper. More detailed description and implementation methods about the NSGA-II can be referred in (Deb et al., 2002). For wrapper feature selection approach, there are several factors for controlling the process of NSGA-II while searching the sub-optimal feature subsets for classiers. To apply NSGA-II to feature selection, we focus on the following issues. 4.2.1. Fitness functions Two competing objectives were dened as the tness functions: the rst was minimization of the number of used features and the second was minimization of the classication error. Four different popular classiers means the K nearest neighbor classier (KNNC) (Grother, Candela, & Blue, 1997), nearest mean classier (NMC) (Veenman & Tax, 2005), linear discriminant classier (LDC) (Du & Chang, 2000) and least-square support vector machine (LS-SVM) (Suykens & Vandewalle, 1999) were employed as induction algorithms to implement and evaluate the proposed feature selection approach. 4.2.2. Encoding scheme The binary coding system was used to represent the chromosome in this investigation. For chromosome representing the feature subsets, the bit with value 1 represents the feature is selected, and 0 indicates feature is not selected. 4.2.3. Genetic operators Genetic operator consists of three basic operators, i.e., selection, crossover and mutation. The binary tournament selection, which can obtain better result than the methods of proportional and genitor selection, was adopted to select the next generation individual. The used crossover technique was the uniform crossover consisting on replacing genetic material of the two selected parents uniformly in several points. The mutation operator used in this work was implemented as conventional mutation operator operating on each bit separately and changing randomly its value. 5. Results and discussion 5.1. Feature extraction For all the 160 samples as described in (Table 1), 160 times frequency matrices were obtained by employing the S transform described in Section 3. Fig. 4 displays the timefrequency distributions of vibration signals from eight states of gearbox obtained by S transform. It can be observed very obviously that the resolutions of timefrequency representations calculated by S transform are shown to be very satisfactory. The timefrequency representations of the gearboxes with different states are shown to be different. Consequently, we can expect that it will be very available to classify the vibration signals with S transform. However, it is impossible to directly use the original time frequency matrix for classication because of the high dimension. In this work, the dimension of the timefrequency matrix is 1024 2048. Then the feature vector should have 2,097,152

dimensions if every matrix is regarded as an input vector. It is not acceptable for any pattern recognition system to deal with such high-dimensional input vectors. Thus it is very desirable to reduce the data dimension to an acceptable scale, and, at the same time, the information of the matrices should be reserved as much as possible. A new subspace decomposition technique NMF is employed reduce the dimension of the timefrequency matrices. With S transform, 160 timefrequency matrices can be obtained. All these matrices are standardized and normalized to fulll the nonnegative constrains for NMF. Forty samples, ve samples for each state, were selected as training samples. All the matrices are transformed to vectors rstly and a training matrix can be formed for NMF. We rst apply the NMF to the training samples to extract non-negative basis vectors W and associated encoding variables H. Feature components can be computed from these non-negative basis vectors W for other samples. The parameter r, which is the reduced dimension, was chosen to be 100 in this paper. The iterative step size is set to be 50. These parameters are chosen based on some preliminary experiments. By computing the feature components with the extracted basis vectors, 100 parameters can be obtained as features for every sample. The feature dimension was reduced from 2,097,152 to 100 by the NMF technique. 5.2. Feature selection One hundred feature components can be obtained via the NMF and S transform. A feature selection procedure is needed to nd the most informative feature components out from the original one hundred features. According to the feature selection approach based on the mutual information and NSGA-II as described in Section 4, the most discriminative feature vector can be acquired. We rstly partitioned the collected 160 samples into two parts: a training dataset and a testing dataset (10 samples of each state for training and another 10 samples for testing). We did this segmentation randomly for 20 times to get a more robust evaluation results. The sequential feature subsets S1  S2   SN were obtained based on the training dataset by using the mRMR criterion and the greedy search as described in Section 4. Then the performances of these sequential feature subsets were evaluated by the testing dataset. Four classiers, as described in Section 4, were adopted to assess the proposed scheme. Fig. 5 has given out the average performances of four classiers using the sequential feature subsets over the 20 randomly segmented datasets. It is clearly that the performances of four classiers were not keeping steadily improvement with the increasing of features number. This phenomenon has proven the assumption that there are many irrelevant and redundant features in the original feature set. Another observation is that the performances of the four classiers varied with different trends. The optimal feature subsets of four classiers varied with each other, too. The candidate feature subsets of the four classiers are selected corresponding best performances with the smallest feature subset size. So the candidate feature subsets for NMC, KNNC, LDC and LS-SVM were chosen as the S50, S45, S38, S54 respectively. The candidate feature subsets were used for further selection using NSGA-II. Based on the candidate feature subsets, more compact feature subsets can be acquired by using wrappers with NSGA-II. The four classiers were also employed as induction algorithms combined with the NSGA-II as wrappers for feature selection. We implemented the NSGA-II methods with the following parameters: population size: 100. generation: 200.

10006

B. Li et al. / Expert Systems with Applications 38 (2011) 1000010009

a
Frequency [Hz]

3000 2000 1000

b
Frequency [Hz] 0 0.1 0.2 0.3 0.4 Time [s] 0.5 0.6

3000 2000 1000

0.1

0.2

0.3 0.4 Time [s]

0.5

0.6

c
Frequency [Hz]

3000 2000 1000

d
Frequency [Hz] 0 0.1 0.2 0.3 0.4 Time [s] 0.5 0.6

3000 2000 1000

0.1

0.2

0.3 0.4 Time [s]

0.5

0.6

e
Frequency [Hz]

3000 2000 1000

f
Frequency [Hz] 0 0.1 0.2 0.3 0.4 Time [s] 0.5 0.6

3000 2000 1000

0.1

0.2

0.3 0.4 Time [s]

0.5

0.6

g
Frequency [Hz]

3000 2000 1000

h
Frequency [Hz] 0 0.1 0.2 0.3 0.4 Time [s] 0.5 0.6

3000 2000 1000

0.1

0.2

0.3 0.4 Time [s]

0.5

0.6

Fig. 4. S transforms of vibration signals in Fig. 3: (a)(h) corresponds to 18 states in Table 1.

1 0.9 0.8

Classification rate

0.7 0.6 0.5 0.4 0.3 0.2 0.1 0


SVM NMC LDC KNNC

10

20

30

40 50 60 Number of features

70

80

90

100

Fig. 5. Performances of four classiers with sequential feature subsets obtained based on mRMR criterion.

crossover rate: 0.9. mutation rate: 1/N (N is the size of the candidate feature subset).

We also randomly partitioned the dataset into training dataset and testing dataset for 20 times to assess the performances of the presented wrapper methods with four classiers.

B. Li et al. / Expert Systems with Applications 38 (2011) 1000010009

10007

Table 25 displays the results of the proposed feature selection scheme with the four classiers. For comparison, the original feature set, the candidate feature subset and feature subset formed by wrappers directly on the original feature set are also evaluated with the four classiers. For convenience, we denote the original feature subset as F0, the ltered feature subset as F1, the wrapper feature subset as F2 and the two stage feature subset as F3. One thing worthy to be noted is that F2 and F3 all give multiple solutions for the feature selection problem. For the NMC algorithm, the best performance 98.99% is achieved by using the feature subset F2 with 33 features. With the original feature subset F0 which contains all the 100 features, the classication rate is only 80.83%. The performance with F1 is not very satisfactory but still superior to F0 and contains only half of the features in F0. Although the performances of F3 were not best when compared to F2, the dimensions of feature subset are much small. With only 10 features, its performance achieved 96.67%. For the KNNC classier, the highest classication rate 97.78% was reached by using F3 with 8 features. The performances of the feature subsets in F2 were also very available but the feature subset sizes are larger than F3. Similar to the NMC classier, the worst performance is obtained by F0 and F1 performed better than F0 with less features. With LDC classier, 100% classication rate is achieved by using F2 with 23 parameters. With feature subsets in F3, the best performance is 98.99% which is comparable to F2 and the feature subset size is 12, which is smaller than F2. The performance of the original feature subset F0 is very poor with LDC classier, which is only 38.28%. F1 gives a performance 89%, which is much better than F0 but worse when compared with F2 and F3. The highest classication rate 100% was achieved by F2 and F3 simultaneously when LS-SVM is employed as classiers. It also can
Table 2 Performances of NMC with different feature subsets. NMC Feature subsets Original (F0) mRMR (F1) NSGA-II (F2) (patrol optimal solutions) 1 2 3 1 2 3 4 Feature size 100 50 23 28 33 5 6 8 10 Performance (%) 80.83 84.83 95.56 97.78 98.89 92.22 93.33 94.45 96.67

Table 4 Performances of LDC with different feature subsets. LDC Feature subsets Original (F0) mRMR (F1) NSGA-II(F2) (patrol optimal solutions) 1 2 3 4 5 1 2 3 4 5 Feature size 100 38 14 15 16 19 23 5 6 7 10 12 Performance (%) 38.28 89.00 91.11 95.56 97.78 98.89 100 93.34 94.45 96.67 97.78 98.89

mRMR + NSGA-II(F3) (patrol optimal solutions)

Table 5 Performances of LS-SVM with different feature subsets. LS-SVM Feature subsets Original (F0) mRMR (F1) NSGA-II(F2) (patrol optimal solutions) Feature size 100 54 28 32 33 35 6 12 16 Performance (%) 88.28 93.61 88.89 97.78 98.89 100 88.89 92.22 100

mRMR + NSGA-II(F3) (patrol optimal solutions)

1 2 3 4 1 2 3

be observed that the dimension of feature subset in F3 is much smaller than F2. The performance of F0 is still the worst in four feature subsets. 5.3. Discussions (1) It can be observed that, for all classiers, the performances obtained by the original feature set (F0) demonstrate to be the worst among four feature subsets. Otherwise, the size of F0 is the largest. It ascertains our assumption that there exist many irrelevant and redundant features, which will decrease the performances and increase the computation cost of classiers, in the original feature set. Thus the feature selection procedure is very necessary before classication. (2) Comparing F1 with F0, it can be found that, the mRMR method can get a better classication rate than the original feature set with less features. However, when compared with F3, F4, its performances seem to be poor and the feature subsets sizes are larger. It is because the lter method did not involve any induction algorithms and the classication accuracy is poorer than the wrapper methods. An advantage of mRMR lies in the computation cost than the wrapper method. (3) The performances of F3 are promising compared with other methods. The classication rates of LS-SVM achieved 100% with F3. The performances of F2 are also very available when compared with F3. In some cases, F2 outperforms F3 in terms of classication accuracy. The main disadvantages of wrapper methods with original feature set lie in the large computation cost and the larger size of the selected feature subsets.

mRMR + NSGA-II (F3) (patrol optimal solutions)

Table 3 Performances of KNNC with different feature subsets. KNNC Feature subsets Original (F0) mRMR (F1) NSGA-II(F2) (patrol optimal solutions) 1 2 3 4 1 2 3 4 Feature size 100 45 25 26 30 31 4 5 6 8 Performance (%) 78.50 83.06 87.78 88.89 94.45 96.67 85.56 92.22 94.45 97.78

mRMR + NSGA-II(F3) (patrol optimal solutions)

10008

B. Li et al. / Expert Systems with Applications 38 (2011) 1000010009 Deb, K., Pratap, A., Agarwal, S., & Meyarivan, T. (2002). A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation, 6(2), 182197. Du, Q., & Chang, C. I. (2000). A linear constrained distance-based discriminant analysis for hyperspectral image classication. Pattern Recognition, 34(2), 361373. Firpi, H., & Vachtsevanos, G. (2008). Genetically programmed-based articial features extraction applied to fault detection. Engineering Applications of Articial Intelligence, 21(4), 558568. Grother, P. J., Candela, G. T., & Blue, J. L. (1997). Fast implementations of nearest neighbor classiers. Pattern Recognition, 30(3), 459465. Jack, L. B., & Nandi, A. K. (2002). Fault detection using support vector machines and articial neural networks, augmented by genetic algorithms. Mechanical Systems and Signal Processing, 16(2-3), 373390. Jack, L. B., Nandi, A. K., & McCormick, A. C. (2000). Diagnosis of rolling element bearing faults using radial basis function networks. Applied Signal Processing, 6(1), 2532. Junsheng, C., Dejie, Y., & Yu, Y. (2006). A fault diagnosis approach for roller bearings based on EMD method and AR model. Mechanical Systems and Signal Processing, 20(2), 350362. Kohavi, R., & John, G. H. (1997). Wrappers for feature subset selection. Articial Intelligence, 97(1-2), 273324. Lee, D. D., & Seung, H. S. (1999). Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755), 788791. Lei, Y., He, Z., Zi, Y., & Chen, X. (2008). New clustering algorithm-based fault diagnosis using compensation distance evaluation technique. Mechanical Systems and Signal Processing, 22(2), 419435. Lei, Y., He, Z., Zi, Y., & Hu, Q. (2007). Fault diagnosis of rotating machinery based on multiple ANFIS combination with GAs. Mechanical Systems and Signal Processing, 21(5), 22802294. Lei, Y., & Zuo, M. J. (2009). Gear crack level identication based on weighted K nearest neighbor classication algorithm. Mechanical Systems and Signal Processing, 23(5), 15351547. Li, W., Shi, T., Liao, G., & Yang, S. (2003). Feature extraction and classication of gear faults using principal component analysis. Journal of Quality in Maintenance Engineering, 9(2), 132143. Lin, J., & Qu, L. (2000). Feature extraction based on morlet wavelet and its application for mechanical fault diagnosis. Journal of Sound and Vibration, 234(1), 135148. Lin, J., & Zuo, M. J. (2003). Gearbox fault diagnosis using adaptive wavelet lter. Mechanical Systems and Signal Processing, 17(6), 12591269. Liu, W., & Zheng, N. (2004). Non-negative matrix factorization based methods for object recognition. Pattern Recognition Letters, 25(8), 893897. Oehlmann, H., Brie, D., Tomczak, M., & Richard, A. (1997). A method for analysing gearbox faults using timefrequency representations. Mechanical Systems and Signal Processing, 11(4), 529545. Oh, I. S., Lee, J. S., & Moon, B. R. (2004). Hybrid genetic algorithms for feature selection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(11), 14241437. Peng, H., Long, F., & Ding, C. (2005). Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(8), 12261238. Peng, Z. K., Tse, P. W., & Chu, F. L. (2005). A comparison study of improved HilbertHuang transform and wavelet transform: Application to fault diagnosis for rolling bearing. Mechanical Systems and Signal Processing, 19(5), 974988. Pu, X., Yi, Z., Zheng, Z., Zhou, W., & Ye, M. 2005. Face recognition using sher nonnegative matrix factorization with sparseness constraints. Paper presented at the lecture notes in computer science. Randall, R. B., Antoni, J., & Chobsaard, S. (2001). The relationship between spectral correlation and envelope analysis in the diagnostics of bearing faults and other cyclostationary machine signals. Mechanical Systems and Signal Processing, 15(5), 945962. Samanta, B. (2004). Gear fault detection using articial neural networks and support vector machines with genetic algorithms. Mechanical Systems and Signal Processing, 18(3), 625644. Samanta, B., Al-Balushi, K. R., & Al-Araimi, S. A. (2003). Articial neural networks and support vector machines with genetic algorithm for bearing fault detection. Engineering Applications of Articial Intelligence, 16(7-8), 657665. Samanta, B., & Nataraj, C. (2009). Use of particle swarm optimization for machinery fault detection. Engineering Applications of Articial Intelligence, 22(2), 308 316. Saravanan, N., Cholairajan, S., & Ramachandran, K. I. (2008). Vibration-based fault diagnosis of spur bevel gear box using fuzzy technique. Expert Systems with Applications. Srinivas, N., & Deb, K. (1994). Muiltiobjective optimization using nondominated sorting in genetic algorithms. Evolutionary Computation, 2(3), 221248. Srinivasan, D., Cheu, R. L., Poh, Y. P., & Ng, A. K. C. (2000). Automated fault detection in power distribution networks using a hybrid fuzzy-genetic algorithm approach. Engineering Applications of Articial Intelligence, 13(4), 407418. Stockwell, R. G., Mansinha, L., & Lowe, R. P. (1996). Localization of the complex spectrum: The S transform. IEEE Transactions on Signal Processing, 44(4), 9981001. Sugumaran, V., Muralidharan, V., & Ramachandran, K. I. (2007). Feature selection using decision tree and classication through proximal support vector machine for fault diagnostics of roller bearing. Mechanical Systems and Signal Processing, 21(2), 930942.

(4) For different classiers, the classication rates with the same feature selection technique varied from each other. It approved the assumption that there does not exist a common optimal feature subset for all classiers. This also suggests that different classiers with different feature subsets potentially can offer complementary information about the patterns to be classied. So, it is very desirable to combine different classiers to achieve a better performance. The future work can be done to explore the capacity of classier ensemble schemes for intelligent fault diagnosis. 6. Conclusions This investigation has described a feature extraction and feature selection scheme for hybrid fault diagnosis of gearbox based on S transform, non-negative matrix factorization, mutual information and NSGA-II. For feature extraction, the S transform was rstly adopted to obtain a high resolution timefrequency distributions of vibration signals. Then the non-negative matrix factorization technique was applied to extract informative features from the timefrequency representations. Then a two stage feature selection scheme combines the lter and wrapper method is outlined based on mutual information and NSGA-II. The lter method was implemented by mRMR criterion based on mutual information and a candidate feature subset can be obtained. Based on the candidate feature subset, wrapper technique combined with the multi-objective optimization evolutionary algorithm NSGA-II was adopted to get a more compact feature subset and higher classication accuracy. Eight different fault states were simulated on a gearbox for evaluating the effectiveness of the proposed intelligent fault diagnosis system. In order to assess the generality of the proposed feature extraction and selection methods, four different classiers were employed in this investigation. Moreover, some other feature selection schemes were also implemented and compared with the proposed approach. Experiment results have shown that the proposed feature extraction and feature selection scheme can give very promising performances with very small feature subset dimension. This research demonstrates clearly that the presented intelligent fault diagnosis system has great potential to be an effective and efcient tool for the fault diagnosis of gearbox and can be easily extended to be applied to other rotating machinery. Acknowledgments This research is supported by the National Natural Science Foundation of China (No. 50705097) and Natural Science Foundation of Hubei Province (No. E2007001048). References
Al-Ghamd, A. M., & Mba, D. (2006). A comparative experimental study on the use of acoustic emission and vibration analysis for bearing defect identication and estimation of defect size. Mechanical Systems and Signal Processing, 20(7), 15371571. Assous, S., Humeau, A., Tartas, M., Abraham, P., & LHuillier, J. P. (2006). S-transform applied to laser doppler owmetry reactive hyperemia signals. IEEE Transactions on Biomedical Engineering, 53(6), 10321037. Chen, Z., He, Y., Chu, F., & Huang, J. (2003). Evolutionary strategy for classication problems and its application in fault diagnostics. Engineering Applications of Articial Intelligence, 16(1), 3138. Cho, Y. C., & Choi, S. (2005). Nonnegative features of spectro-temporal sounds for classication. Pattern Recognition Letters, 26(9), 13271336. Dash, P. K., & Chilukuri, M. V. (2004). Hybrid S-transform and Kalman ltering approach for detection and measurement of short duration disturbances in power networks. IEEE Transactions on Instrumentation and Measurement, 53(2), 588596. Dash, P. K., Panigrahi, B. K., & Panda, G. (2003). Power quality analysis using Stransform. IEEE Transactions on Power Delivery, 18(2), 406411.

B. Li et al. / Expert Systems with Applications 38 (2011) 1000010009 Suykens, J. A. K., & Vandewalle, J. (1999). Least squares support vector machine classiers. Neural Processing Letters, 9(3), 293300. Tan, T., Fu, X., Zhang, Y., & Bourgeois, A. G. (2008). A genetic algorithm-based method for feature subset selection. Soft Computing, 12(2), 111120. Veenman, C. J., & Tax, D. M. J., 2005. A weighted nearest mean classier for sparse subspaces. Paper presented at the proceedings of the IEEE computer society conference on computer vision and pattern recognition. Wang, G., Luo, Z., Qin, X., Leng, Y., & Wang, T. (2008). Fault identication and classication of rolling element bearing based on time-varying autoregressive spectrum. Mechanical Systems and Signal Processing, 22(4), 934947. Wang, W. J., & McFadden, P. D. (1993). Early detection of gear failure by vibration analysis ii. interpretation of the timefrequency distribution using image processing techniques. Mechanical Systems and Signal Processing, 7(3), 205215. Widodo, A., Yang, B.-S., & Han, T. (2007). Combination of independent component analysis and support vector machines for intelligent faults diagnosis of induction motors. Expert Systems with Applications, 32(2), 299312.

10009

Wuxing, L., Tse, P. W., Guicai, Z., & Tielin, S. (2004). Classication of gear faults using cumulants and the radial basis function network. Mechanical Systems and Signal Processing, 18(2), 381389. Yang, J., & Honavar, V. (1998). Feature subset selection using genetic algorithm. IEEE Intelligent Systems and Their Applications, 13(2), 4448. Yuan, Z., & Oja, E., 2005. Projective nonnegative matrix factorization for image compression and feature extraction. Paper presented at the lecture notes in computer science. Zhu, F., & Guan, S. (2004). Feature selection for modular GA-based classication. Applied Soft Computing Journal, 4(4), 381393. Zhu, Z., Ong, Y. S., & Dash, M. (2007). Wrapper-lter feature selection algorithm using a memetic framework. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 37(1), 7076.

Das könnte Ihnen auch gefallen