RF outperforms ANNs and SVMs for intelligent diagnosis of rotating machinery

Article
Transactions of the Institute of

Measurement and Control
Comparison of random forest, artificial 2018, Vol. 40(8) 2681–2693
Ó The Author(s) 2018
neural networks and support vector Reprints and permissions:

sagepub.co.uk/journalsPermissions.nav
DOI: 10.1177/0142331217708242
machine for intelligent diagnosis of journals.sagepub.com/home/tim
rotating machinery
Te Han1, Dongxiang Jiang1, Qi Zhao2, Lei Wang2 and Kai Yin2
Abstract
Nowadays, the data-driven diagnosis method, exploiting pattern recognition method to diagnose the fault patterns automatically, achieves much success
for rotating machinery. Some popular classification algorithms such as artificial neural networks and support vector machine have been extensively stud-
ied and tested with many application cases, while the random forest, one of the present state-of-the-art classifiers based on ensemble learning strategy,
is relatively unknown in this field. In this paper, the behavior of random forest for the intelligent diagnosis of rotating machinery is investigated with vari-
ous features on two datasets. A framework for the comparison of different methods, that is, random forest, extreme learning machine, probabilistic
neural network and support vector machine, is presented to find the most efficient one. Random forest has been proven to outperform the compara-
tive classifiers in terms of recognition accuracy, stability and robustness to features, especially with a small training set. Additionally, compared with tra-
ditional methods, random forest is not easily influenced by environmental noise. Furthermore, the user-friendly parameters in random forest offer
great convenience for practical engineering. These results suggest that random forest is a promising pattern recognition method for the intelligent diag-
nosis of rotating machinery.
Keywords
Intelligent fault diagnosis, rotating machinery, random forest, artificial neural networks, support vector machine
Introduction diagnostic techniques, which can realize the automated and

intelligent diagnosis with pattern recognition methods, receive
As a kind of industrial infrastructure, rotating machinery has wide attention and develop rapidly in recent years (Bogoevska
become one of the key equipments in a lot of industry sectors et al., 2017; Cheng et al., 2016; Han et al., 2017; Liu et al.,
such as power system and aerospace engineering. With the 2014). This strategy includes three important parts: data
extreme application conditions, the critical components in acquisition, feature extraction and pattern recognition.
rotating machinery including bearing, gearbox, rotor, and so Adopting the extracted features from the vibration signals of
forth, are easily subject to faults, which may cause the different failure modes, the classification model can be
machine breakdown and economic loss. It is very necessary trained, so as to make decision intelligently when similar pat-
to develop the condition monitoring and fault diagnosis terns come afterwards. Usually, the capability of classification
(CMFD) technologies for rotating machinery. Vibration anal- model has a significant influence on diagnostic results, indi-
ysis has been accepted as a major diagnostic tool because cating the need to attempt appropriate pattern recognition
vibration signals can be obtained handily and contain abun- algorithm for rotating machinery (Han and Jiang, 2016). The
dant information of machine conditions (Yunusa-Kaltungo most used classifiers in this field are artificial neural networks
et al., 2015). (ANNs) including back propagation neural network (BPNN),
Once we acquire the vibration signal, there are mainly two radical basis function (RBF), learning vector quantization
categories of diagnostic approaches: signal processing-based
approaches and pattern recognition-based approaches (Guo
1
et al., 2016). In the first class, fault pattern can be identified State Key Lab of Control and Simulation of Power System and
by detecting the feature of vibration waveforms or the fault Generation Equipment, Department of Thermal Engineering, Tsinghua
characteristic frequency using advanced signal processing University, China
2
AECC Commercial Aircraft Engine Co., Ltd, Shanghai, China
methods, such as wavelet transform (WT), empirical mode
decomposition (EMD) and spectral kurtosis (SK) (Han et al., Corresponding author:
2016; Sun et al., 2017; Tang et al., 2016; Yaqub and Loparo, Te Han, State Key Lab of Control and Simulation of Power System and
2016), while this procedure requires the operator to grasp a Generation Equipment, Department of Thermal Engineering, Tsinghua
good deal of expertise, which may bring difficulties for online University, Beijing 100084, China.
diagnosis. Consequently, as the second ones, the data-driven Email:hant15@mails.tsinghua.edu.cn
2682 Transactions of the Institute of Measurement and Control 40(8)
(LVQ) (Jiang and Liu, 2011), wavelet neural network (WNN) successful applications in the 1990s, ensemble learning has
(Lei et al., 2011), extreme learning machine (ELM) (Tian been proven to remarkably improve the accuracy and general-
et al., 2015), probabilistic neural network (PNN) (Zhou and ization capability of the whole system (Hansen et al., 1992;
Cheng, 2016; Dou and Zhou, 2016) and so forth, and support Schwenk and Bengio, 2000). In 2001, Breiman (2001) crea-
vector machine (SVM) (Li and Zhang, 2011). tively put forward the random forest merging the bagging and
ANN, which essentially comes from the abstraction of the random split selection of features. In recent years, random
human brain neural network, is a hot spot in the field of artifi- forest has drawn wide attention in the fields of E-tongue (Liu
cial intelligence since 1980s. To date, there are nearly 40 kinds et al., 2013), acoustic emission (Morizet et al., 2016), mass
of neural network models in which many feedforward neural spectrometry (Coomans et al., 2006), digital soil mapping
network models have been extensively studied and documen- (Yang et al., 2016), eye state estimation (Dong et al., 2016),
ted for fault detection and diagnosis in mechanical system. remote sensing image (Luo et al., 2016), et al.
The problems of slow training speed, local minima and sensi- In this paper, we will introduce the novel random forest
tive learning rate in many traditional models such as BPNN, algorithm to rotating machinery fault diagnosis and focus on
have become the main bottlenecks, constraining the develop- its performance, comparing with currently more popular
ment of this technology. As two kinds of modified models methods, that is, ELM, PNN and SVM. We use three classes
from the large family of feedforward neural networks, ELM of features from time-domain, frequency-domain and multi-
and PNN can successfully avoid these defects with the excel- ple scale components, respectively, and two validation data-
lent ability of self-organizing and self-learning. Nevertheless, sets that are the bearing dataset from Case Western Reserve
the signal pre-processing is usually needed for most of ANNs University bearing data center and a gearbox dataset from
owing to the sensibility to feature magnitudes. Additionally, our test rig. The remaining part is organized as follows. In
the performance of ANNs may be undesirable when dealing section 2, the principles of random forest, ELM, PNN and
with high-dimensional features without dimension reduction SVM are introduced briefly. The main diagnosis procedure is
or prior feature selection. Moreover, a good deal of typical given in section 3. In section 4, the two datasets are described
training samples are necessary to ensure the ergodicity and and the experimental verification is conducted, including the
improve robustness against overfitting (Xu and Chen, 2013). comparison and discussion with different methods in the
SVM, developed based on statistical learning theory, is a aspects of accuracy, standard deviation, sensitivity to noise,
generic and effective pattern recognition method for rotating parameters tunning and time consumption. A comparative
machinery fault diagnosis. To be more accurate, SVM discussion between this work and some published literatures
approximately realizes the goal of structural risk minimiza- are also given. Finally, the conclusions can be drawn in sec-
tion for binary classification problems. Some research has tion 5.
shown that nonlinear SVM not only can cope with high-
dimensional features, but has a stronger generalization
capability than ANNs when solving small sample learning Background knowledge
problems (Yang et al., 2007). Unfortunately, the recognition
accuracy will be severely degraded if two model parameters,
Random forest
namely the penalty factor and kernel function parameter, are Random forest is a typical ensemble learning method that
not properly selected. Although the cross-validation (CV) operates by integrating multiple weak decision tree classifiers
method and some heuristic methods such as genetic algorithm and reaches a final decision by the majority of votes. In prac-
(GA) and particle swarm optimization (PSO) can be utilized tice, the performance of a single classifier always differs greatly
to guide the selection of parameters, the whole procedure of depending on the different types of applications or the dimen-
parameters tuning would become rather complex and time- sion of feature space. Therefore, researchers put forward the
consuming. ensemble learning methods to improve the recognition capabil-
Faced with the aforementioned problems, there is, in par- ity of individual classifier. Dietterich (2000) demonstrated the
ticular, a need to employ more efficient pattern recognition superiority of ensemble learning in the aspects of statistics,
methods in the field of fault diagnosis for rotating machinery, local search and function expression by theoretical analysis. In
especially those are robust to the features, model parameters earlier studies, how to keep the diversity of base learners plays
selection and even the number of training samples. Random a critical role in a successful ensemble learning method. Three
forest is such a powerful pattern recognition method that may novel algorithms were effectively developed, including
meet these requirements based on existing studies. However, it Breiman’s ‘‘bagging’’ (Breiman, 1996), Schapire’s ‘‘boosting’’
is relatively unfamiliar in this field. Up to now, to the best of and Ho’s ‘‘random subspace’’ (Ho, 1998).
our knowledge, there is no comprehensive comparison of per- In fact, random forest merges the ideas of bagging and
formance between random forest and other common models random subspace, resulting in two valuable randomness (see
using the fault datasets of rotating machinery. Random forest Figure 1). In bagging algorithm, multiple training subsets
is developed from a novel learning strategy called ‘‘ensemble can be drawn from the training set with bootstrap method
learning’’, which combine many base classifiers and syntheti- (randomly sampling with replacement). Each bootstrap
cally consider their results (Breiman, 2001). Before random retains the same size of original training set while some mole-
forest, the bagging algorithm with bootstrap sampling pro- cules are repeated and some others are left out. Actually,
posed by Breiman (1996) and the random decision forests two-thirds of training samples are applied to grow each tree.
(RDF) using the idea of random subspace presented by Ho The other lost one-third of training samples called Out-of-
(1998) are also based on this learning strategy. With the Bag (OOB) data can be used for performance estimation,
Han et al. 2683
Figure 1. Workflow of random forest algorithm.
suggesting that there is no requirement of additional cross- prediction can be evaluated by the ensemble ntree votes for a
validation for random forest. With a forest consisting of ntree new testing sample (Svetnik et al., 2003).
trees, corresponding ntree bootstraps can be generated, indi- Based on the existing application of random forest, some
cating the randomness of training samples for each tree. attracted advantages can be presented such as the robustness
Essentially, the bagging algorithm can prominently reduce against overfitting, only two user-friendly parameters: ntree
the variance of the base classifier and improve the generali- and mtry , insensitive to prior feature selection, good ability to
zation error. The second randomness is introduced by Ho’s handle with badly unbalanced dataset, and so forth, showing
‘‘random subspace’’, which means only a randomly selected its great potentials for intelligent fault diagnosis of rotating
subset (mtry descriptors) is considered for each tree instead of machinery.
all descriptors in the input feature vector. On the one hand,
this idea further promotes the diversity of base learners. On
the other hand, it makes the tree less greedy and increases ELM
the possibility that some weak features can get into the tree. As a new single-hidden layer feedforward artificial neural net-
The effect of all descriptors in the input feature vector will be work (SLFN), ELM can randomly generate the input weights
magnified and these weak features may become beneficial in and hidden bias with no adjustment in the process of training,
combination with other features. All the ntree trees in the for- and calculate output weights according to Moore–Penrose
est are grown without pruning and the growing algorithm is generalized inverse. Provide a training set fðxi ; ti Þj
CART (classification and regression trees). A comprehensive xi 2 Rn ; ti 2 Rm ; i = 1; 2; . . . ; Qg where xi is the ith training
( )
sample and m denotes the number of total classes. The output 1 1 Xm ðX XAi ÞT ðX XAi Þ
of ELM can be expressed as below fA ðXÞ = exp
ð2pÞP=2 dP m i=1 2d2
2 3 ð4Þ
t1j
6t 7
6 2j 7 where m is the total number of training samples in class A, X
T = t1 ; t2 ; . . . ; tQ m 3 Q ; tj = 66 .. 7
7
4 . 5 is the testing sample, XAi is the ith training sample of class A,
d is the smoothing parameter and p the dimensionality of the
tmj
2 Pl 3 space.
ð1Þ
i = 1 bi1 g wi xj + bi As a feedforward neural network, PNN consists of four
6 Pl 7
6 i = 1 bi2 g wi xj + bi 7 layers, which are input layer, pattern layer, summation layer
6 7
=6 .. 7 ð j = 1; 2; . . . ; QÞ and output layer. The input layer transmits input vectors to
6 7
4 . 5 the nodes of pattern layer. The pattern layer calculates the dis-
Pl tance between the input vector and the patterns of training set
i = 1 bim g wi xj + bi m31
by a nonlinear operator. Then, the summation layer simply
where l is the number of hidden neurons, gð xÞ is the activation sums the output from pattern layer and estimates the prob-
function, wi = ½wi1 ; wi2 ; . . . ; win is the input weights connect- ability densities using all multivariate Gaussian distributions.
ing all the input nodes and the ith hidden node, bi is the bias Finally, the output layer classifies the vector into a pattern
of the ith hidden node and bim is the output weight between depending on the maximum of these probabilities.
ith hidden neuron and mth output neuron. Equation (1) can
be rewritten as the following form
SVM
Hb = T0 ; The core idea of SVM is to create a classification hyperplane

H w1 ; . . . ; wl ; b1 ; . . . ; bl ; x1 ; . . . ; xQ as decision surface, maximizing the margin of separation
2 3 between the two classes. Given a training sample set
gðw1 x1 + b1 Þ . . . gðwl x1 + bl Þ ð2Þ
6 7 S = fxi ; yi gli = 1 , each sample xi is an n-dimensional input fea-
.. .. ..
=6 4 . . .
7
5 ture vector and has a binary class label yi 2 f1; + 1g. For
linearly separable samples, the optimal hyperplane can be
g w1 xQ + b1 . . . g wl xQ + bl Q3l found by solving the following optimization problem
where H is the output matrix of hidden layer of the network. 1 Xl
For the fixed H, the training process is equal to find an opti- min kvk2 + C j s:t: yi ðv xi + bÞ
i=1 i
2 ð5Þ
mal b that can be decided by the least-squares solution of the
1 ji ; ji 0; i = f1; 2; . . . ; lg;
following equation
where v is the normal vector of the hyperplane, b is the bias,
minkHb T0 k ð3Þ ji are the slack variables and C is the penalty factor. When
b
the training samples are linearly inseparable in the feature
Once the single parameter: the number of hidden neurons is space, a non-linear function fð xÞ can be implemented to map
set, the unique optimal solution can be obtained. them into a high-dimensional feature space. The kernel func-
tion
returns
a dot product
of the mapped feature space, that is
K xi :xj = fT ðxi Þ f xj . The universally used radial basis
PNN function (RBF) kernel function in the field of fault diagnosis
can be expressed as follows
PNN is developed from the RBF network and the theoretical
foundation is the Bayes decision theory. The fault diagnosis !
based on PNN is one kind of widely accepted decision kx xi k2
K ðx; xi Þ = exp ; ð6Þ
method in probability statistics. It can be described as: sup- 2g2
posing two assigned fault classes (uA and uB ), for a testing
sample X = ðx1 ; x2 ; . . . ; xn Þ, where g is the kernel function parameter. After introducing
the Lagrange multipliers ai 0, the problem of equation (8)
if hA lA fA ð X Þ . hB lB fB ð X Þ; then X 2 uA can be transformed as
if hA lA fA ð X Þ\hB lB fB ð X Þ; then X 2 uA
1 Xl
min a a y y K xi :xj
i;j = 1 i j i j
where hA , hB are the priori probability of class uA , uB , lA is the 2 ð7Þ
Xl Xl
cost factor that the fault sample belonging to uA is misclassi- a s:t: 0 ai C;
i=1 i
a y = 0:
i=1 i i
fied into uB , lB is the cost factor that the fault sample belong-
ing to uB is misclassified into uA , fA and fB are the PDFs of Then, SVM can predict the label of samples via the optimal
classes uA and uB , respectively. The key PDFs can be esti- classification hyperplane and the decision function can be
mated by Parzen method given by
Han et al. 2685
Table 1. The statistical characteristic parameters in time-domain.
Number Feature expression Number Feature expression Number Feature expression

PN tp7
tp1 n=1
xðnÞ tp7 maxðxðnÞÞ tp13
N tp4
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi tp7
tp2 P N tp8 minðxðnÞÞ tp14
1
N xn2 tp3
n=1
tp3
N pffiffiffiffiffiffiffi 2 tp9 tp7 tp8 tp15 tp5
1
P
N jxn j ðtp2 Þ3
PNn = 1 PN tp6
tp4 n=1
jxðnÞj tp10 n=1 ðxðnÞ tp1 Þ2 tp16
N
N1 ðtp2 Þ4
PN tp2
tp5 n=1
ðxðnÞtp1 Þ3 tp11
N tp4
PN tp7
tp6 n=1
ðxðnÞtp1 Þ4 tp12
N tp2
Where xðnÞ is a sampling series of raw signal with N points.
Table 2. The statistical characteristic parameters in frequency-domain.
Number Feature expression Number Feature expression Number Feature expression

PK PK PK
fp1 yðkÞ fp6 fk y ðkÞ fp11 ðfk fp5 Þ3 yðkÞ
k=1
PkK= 1 k=1
K y ðkÞ Kðfp6 Þ3
PK sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
P
k=1
PK
fp2 k=1
ðyðkÞfp1 Þ2 fp7 K
fk 2 y ðkÞ
fp12 k=1
ðfk fp5 Þ4 yðkÞ
K1 Pk =K 1 Kðfp6 Þ4
k=1
y ðkÞ
PK sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
PK PK
fp3 ðyðkÞfp1 Þ3 fp8 fk 4 y ðkÞ
fp13 ðfk fp5 Þ1=2 y ðkÞ
k=1
pffiffiffiffiffi3 PkK= 1 2
k=1 pffiffiffiffiffi
K fp2 K fp6
f y ðkÞ
k=1 k
PK PK
fp4 ðyðkÞfp1 Þ4 fp9 k=1 k
f 2 y ðkÞ
k=1 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
P P ffi
Kðfp2 Þ2 K K
k=1
y ðkÞ f 4 y ðkÞ
k=1 k
rP
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
fp5 K 2 fp10 fp6
k=1
ðfk fp5 Þ yðkÞ
K fp5
Where yðkÞ is the frequency spectrum of the discrete signal and k means the sequence number of spectrum line; fk represents the frequency value of
the kth. spectrum line.
hXl i
f ð xÞ = sign ai y i K ðx i x Þ + b : ð8Þ faults are introduced to rotating machinery test rig, and thus
i=1
the samples under different machine conditions can be
Since the algorithm above is designed for binary classifica- acquired. In real industrial case, the history database col-
tion, when dealing with k classes, we can convert the problem lected by condition monitoring systems contains abundant
into k ðk 1Þ=2 binary classifications, indicating one SVM signal types after the long-time operation. Hence, the signal
model is designed between any two classes. The label of a samples can be manually selected. With the obtained fault
sample can be identified according to the most votes from all datasets, the samples of each machine condition are divided
binary classifiers. The popular libSVM toolbox (Chang and into two classes: training sample and testing sample.
Lin, 2011) exactly utilizes this approach. It should be noted For feature extraction, different features may lead to dif-
that the multi-class SVM is fundamentally different from the ferent diagnostic results. To have a comprehensive compari-
ensemble learning of random forest. All the individual lear- son, three classes of popular features are extracted for further
ners aim to the same problem in ensemble learning methods model training: time-domain statistical features (TDF),
while the strategy of multi-class SVM is to divide and con- frequency-domain statistical features (FDF) and multiple
quer the entire problem. scale features (MCF). (1) When a fault occurs in a mechanical
system, the time-domain waveform of vibration signal may
change both in amplitude and distribution. Sixteen statistical
Data-driven fault diagnostic techniques for characteristic parameters from time-domain in Table 1 are
taken. (2) Similar to time-domain, the frequency spectrum of
rotating machinery
fault signal may also alter since the impact of faults will
As the data-driven diagnostic technique, the first step is data arouse the resonance of the system and some high frequency
acquisition. In this experimental analysis, a number of vibra- components may appear. Thirteen statistical characteristic
tion signal samples can be monitored using acceleration sen- parameters from frequency-domain are displayed in Table 2.
sors when the machine works. Different types of artificial (3) With the development of non-stationary analysis methods
Table 3. Description of two datasets.
Experimental Object Health status Load speed (rpm) Training samples Testing Label Sample
validation samples length
Dataset A Rolling bearing Health 1797, 1772, 1750, 1730 5, 10, 20, 50 50 1 1024
Inner race fault 1797, 1772, 1750, 1730 5, 10, 20, 50 50 2
Outer race fault 1797, 1772, 1750, 1730 5, 10, 20, 50 50 3
Ball fault 1797, 1772, 1750, 1730 5, 10, 20, 50 50 4
Dataset B Gearbox Health 1500, 1200, 900 5, 10, 20 40 1 2048
Tooth broken 1500, 1200, 900 5, 10, 20 40 2
Tooth surface spalling 1500, 1200, 900 5, 10, 20 40 3
Gear root crack 1500, 1200, 900 5, 10, 20 40 4
Figure 2. Vibration waveform of bearing with four states: (a) health; (b) inner race fault; (c) outer race fault; (d) ball fault.
for vibration signal of rotating machinery, variational mode Experimental verification and discussion
decomposition (VMD) (Dragomiretskiy and Zosso, 2014) is
utilized to decompose the vibration signal into several stable Experimental data description
components, which represent different frequency bands and In this paper, the comparative study was conducted on two
reflect the multi-scale information from raw signal. Then, an datasets: the bearing dataset from Case Western Reserve
AR model can be established for each component. All the University bearing data center and a gearbox dataset from
parameters of AR model are served as the feature vectors, our test rig. Dataset A consists of four classes, which are
which can denote the condition of the mechanical system health, inner race fault, outer race fault and ball fault, con-
effectively. sidering four load speeds; that is, 1797rpm, 1772rpm,
The next stage is to realize intelligent diagnosis via pattern 1750rpm and 1730rpm. Single point faults are introduced
recognition techniques. The classification model can be to the test bearings using electro-discharge machining with
trained with the extracted feature vectors from training sam- fault diameters of 0.014 inches (1 inches=25.4 mm). The
ples. In this step, the parameters tunning is crucial for some vibration waveforms of the bearing with 1797rpm load
pattern recognition methods such as ANNs and SVM. Then, speed are presented in Figure 2. In each load condition, the
the testing set is applied to examine the diagnostic perfor- vibration signal of every bearing status is truncated to 100
mance, so as to ensure the learned classifier can make diagno- samples with the length of 1024 points. The training set
sis precisely when similar faults occur in the rotating and testing set can be partitioned at random. To compare
machinery afterwards. In this paper, three indexes are used the performances of classifiers with different size of train-
for quantitative assessment, namely mean accuracy, standard ing set, in each bearing status, we employed the different
deviation and time consumption based on many random number of samples (5, 10, 20 and 50 respectively) to train
tests. Additionally, the complexity level of parameters tun- the models and the other 50 samples are utilized for
ning in different classifiers is taken into consideration. verification.
Han et al. 2687
Figure 3. Vibration waveform of gearbox with four states: (a) health; (b) tooth broken; (c) tooth surface spalling; (d) gear root crack.
Similarly, the gearbox dataset B is composed of four (3) PNN: similarly, the spread value is determined by the 5
classes, which are health, tooth broken, tooth surface spal- fold CV. (4) SVM: the most popular ‘RBF’ kernel function is
ling, and gear root crack, respectively, under three operating adopted. The selection of another two crucial structural para-
conditions of high speed shaft, that is, 1500rpm, 1200rpm meters in SVM, penalty factor c and kernel function para-
and 900rpm. Tooth broken fault is simulated by cutting off a meter g, are decided by GA in the sense of 5 fold CV.
tooth in gear with the depth of 1.5mm. For tooth surface According to literature (Luo et al., 2015), we set the maxi-
spalling, we process a deep groove with the width of 1.5mm mum generation to 50 and the number of populations to 20
and the depth of 1mm on the surface of a tooth. And the fail- in GA. Detailed parameters discussion will be interpreted in
ure mode of gear root crack is made by manufacturing a the subsequent part.
crack on the dedendum with the width of 0.1mm and the Tables 4 and 5 give the diagnosis performance in two data-
depth of half the dedendum width. Figure 3 displays the sets respectively. In dataset A, it is clear that random forest
waveforms of the gearbox with 1500rpm load speed. A total achieves the higher diagnosis accuracies and the lower stan-
of 60 samples of 2048 dimensions are generated in which dif- dard deviations than that of the other three classifiers in all
ferent numbers of training samples (5, 10 and 20, respectively) cases. When we apply plenty of samples to train classifiers
and 40 testing samples are picked out randomly to form the (n=50), all the four models can reach high recognition rates
training set and testing set in one class. For both the two with FDF and MSF, whereas the diagnosis rate of PNN
datasets, each test is repeated 50 times to get a mean accuracy drops to 89.82% using the TDF, which means PNN cannot
and standard deviation. The final results can be acquired by handle with TDF very well in these tests. Similarly, random
the average of overall load conditions. forest also performs much better than ELM and SVM using
TDF. Besides, it is worth mentioning that a quite satisfactory
result can be obtained by random forest only with a small
Performance comparison with different features in training set (n=5, 10, for example), which proves its strong
generalization ability and excellent self-learning capacity. For
two datasets ELM and SVM, when training sample size is reduced to 5,
In this part, we will compare the average diagnosis rate and the accuracies distinctly decrease to 77.00% and 86.41%
standard deviation by random forest, ELM, PNN and SVM respectively using TDF, 84.66% and 84.14% respectively
using different features in the two datasets. Firstly, the selec- using FDF. Simultaneously, the higher standard deviations
tion criteria of pivotal parameters in each classifiers are intro- reflect the unstable behaviour of ELM and SVM. In essence,
duced. (1) Random forest: the default vue of ntry , representing most ANNs and SVM need a lot of typically labelled samples
the number of independent trees in the forest, is 500. The to train the model structure, implying that it may be nontri-
number of random feature subset, mtry , out of m total features vial to popularize the algorithms in industrial tasks due to the
pffiffiffiffi
are set to the recommended value, that is mtry = b mc lack of fault samples. A more intuitional change of diagnostic
(Svetnik et al., 2003). (2) ELM: the optimal number of hidden accuracies with different features and training sample sizes in
neurons in the hidden layer is determined through the 5 fold the four classifiers is displayed in Figure 4. Similar results can
CV on training samples in each test, ranging from 1 to 200. be observed from dataset B. Based on the comparison above,
Table 4. Comparison of diagnosis results with different features in dataset A.
Feature No. of training Random forest ELM PNN SVM

samples
Acc(%) Std(%) Acc(%) Std(%) Acc(%) Std(%) Acc(%) Std(%)
TDF 50 98.04 1.09 95.89 1.22 89.82 1.96 93.90 1.68

20 96.18 1.81 93.44 2.16 85.51 3.32 92.25 2.27
10 94.31 2.00 89.01 4.41 80.93 4.15 90.42 3.76
5 92.69 2.41 77.00 8.71 77.12 5.38 86.41 6.38
FDF 50 99.76 0.30 99.43 1.09 99.10 0.73 98.33 1.09
20 99.59 0.56 98.40 2.13 98.53 1.17 95.10 2.65
10 99.08 1.21 96.06 3.80 97.23 2.17 89.60 4.18
5 97.77 1.86 84.66 7.34 94.24 4.64 84.14 4.88
MSF 50 99.85 0.28 99.29 1.33 99.84 0.30 99.57 0.82
20 99.61 0.53 98.45 2.76 99.48 0.76 99.06 1.38
10 99.24 0.95 96.95 3.61 98.81 1.30 97.92 2.43
5 98.64 1.29 95.46 5.13 97.93 1.67 96.04 4.04
Table 5. Comparison of diagnosis results with different features in dataset B.
Feature No. of training Random forest ELM PNN SVM

samples
Acc(%) Std(%) Acc(%) Std(%) Acc(%) Std(%) Acc(%) Std(%)
TDF 20 97.64 1.29 94.13 2.55 84.20 3.03 96.48 1.64

10 96.31 2.13 87.46 4.33 76.90 4.38 93.75 3.16
5 94.72 2.89 76.66 7.34 67.55 6.26 88.45 6.38
FDF 20 99.57 0.46 98.99 1.40 98.51 1.45 98.14 1.88
10 99.34 0.59 96.63 3.19 96.63 2.39 95.96 2.55
5 98.53 1.23 93.46 5.38 93.24 4.74 91.80 3.82
MSF 20 97.50 1.16 97.19 2.08 95.28 2.13 97.04 1.77
10 95.98 2.27 92.08 6.13 93.51 2.77 93.81 3.25
5 92.47 4.61 87.00 7.77 89.38 5.22 86.27 9.89
the following points can be drawn. (1) Random forest is capa- statistically significant win over the comparative methods
ble of handling with three types of features without prior when handling the test samples with different levels of noise.
selection, showing it is a powerful and efficient pattern recog- As shown in Figure 5, for ELM, PNN and SVM, an obvious
nition approach in the field of fault diagnosis. (2) Random downward trend subsequently appears when the SNR is lower
forest always retains a superior diagnostic performance than 22 dB, while random forest still performs significantly
regardless of the number of training samples and significantly well in a wider range of SNR owing to the strong anti-noise
outperforms the two kinds of ANNs and SVM in the case of ability of bagging algorithm used in random forest (Lu et al.,
small training set. 2017).
Performance comparison under noise environment Parameters discussion in different classifiers

In real industrial tasks, the raw vibration signal always con- For an excellent classification algorithm, we should demand
tains much noise, which has great effects on the diagnostic not only the accuracy, but also the robustness of its para-
results. However, it is difficult to acquire all the labelled train- meters. If the technique can only perform well with the para-
ing sets under different noisy environments. To conform this meters in a narrow area, the procedures of parameter tunning
condition, the additive white Gaussian noise is artificially are usually tough and costly, increasing the difficulties for
added to testing samples with different signal to noise ratios practical engineering project. Consequently, it is necessary to
(SNRs). The noise tests are carried out to investigate the discuss the diagnostic accuracy with respect to the crucial
robustness of these classification models with different SNRs. parameters in the analysed classifiers. The experiment is con-
This experiment is conducted in dataset A using FDF as input ducted in the dataset A with 1797 load speed using FDF as
vectors, and sufficient training samples (n=50) are adopted. input vectors. Figure 6(a) shows the diagnostic accuracy by
The diagnosis results for the analysed classifiers are shown in random forest versus the number of trees and the number of
Table 6. It is clear to see that random forest achieves random feature subset. Figure 6(b) is the zoomed change
Han et al. 2689
Figure 4. Comparisons of accuracy using different classifiers in dataset A: (a) random forest; (b) ELM; (c) PNN; (d) SVM.
Table 6. Comparison of diagnosis results under different noise environment.
Diagnosis rates (%)
Classifier SNR
12 dB 13 dB 14 dB 15 dB 16 dB 18 dB 20 dB 22 dB 24 dB 26 dB
Random forest 74.78 85.82 92.95 95.94 97.74 98.60 99.00 99.26 99.46 99.53
ELM 65.53 69.53 73.01 76.36 79.72 85.48 91.40 96.81 98.83 99.37
PNN 63.25 71.64 79.16 84.84 88.77 93.43 96.51 97.81 98.28 98.87
SVM 66.87 73.32 78.85 83.59 87.66 92.28 95.68 97.24 98.11 98.67
curves. It is clear to find, when the number of trees is over 5 utilized for comparisons. Furthermore, the diagnosis rate also
with each mtry , the accuracy fluctuates slightly over 98% and has a good robustness for mtry . The recommended value
all the standard deviations are lower than 0.3%, validating mtry = 3 reaches a relatively high accuracy.
the algorithm is substantially robust to the number of trees. On the other hand, the accuracy curves of other 3 classi-
Some research has shown that the random forest can achieve fiers are described in Figure 7. From Figure 7(a), one can
a good recognition performance for most applications when apparently observe the accuracy changes dramatically with
the number of trees is below 50 (Dong et al., 2016), while the number of hidden neurons for ELM. As the number of
more trees will guarantee the error convergence, that is the hidden neurons grows, it does not mean a better performance
law of large numbers. In this work, the default 500 trees are will be obtained. In addition, the training set also has a
version 8.2. SVM and GA optimization procedure was con-

ducted by using libSVM version 3.14 (Chang and Lin, 2011).
Random forest was run with the help of open source toolbox
of Abhishek Jaiantlal (University of Colorado Boulder, USA)
on MATLAB platform. The hardware environment mainly
includes a quad-core i7-4790 3.6Ghz CPU and DDR3-1600
16G memory. The average processing time of every model is
presented in Table 7. In the first stage, compared with other
three models, random forest possesses the marked advantages
due to the robust parameters with no need of tunning.
Although random forest fails to run faster than ELM and
SVM in the predicting step, most of the tests can be finished
within less than 5ms. In addition, it is easy to develop parallel
Figure 5. Diagnostic results of employed classifiers with different computing for random forest on the basis of the characteris-
SNRs. tics of this algorithm.
significant influence on the tendency of accuracy, meaning it Comparison with existing studies
may be hard to choose the optimal parameter directly.
Likewise, from Figure 7(b), we can find a good recognition To further verify the potential values of this scheme, a com-
rate heavily relies on the proper spread value for PNN. parative analysis between this work and some published lit-
Generally, a larger value can generate smooth PDFs while eratures adopting the same bearing data from Case Western
results in increased computational cost. In this work, we Reserve University Bearing Data Center was made (see
applied the 5 fold CV to guide the selection of the number of Table 8). Yang et al. (2007) employed 11 TDF in tandem with
hidden neurons and the spread value. For SVM, penalty fac- three fractal dimensions (FD) as input feature vectors and
tor c and kernel function parameter g play a key role in identified the bearing faults using a SVM with 10 fold cross
achieving a significant higher accuracy. Figure 7(c) shows the validation. Li and Zhang (2011) presented a feature extraction
contour of accuracy with varying parameter pairs. The three- method of supervised locally linear embedding projection
dimensional curve is illustrated in Figure 7(d). The choice of (SLLEP) and a minimum-distance classifier in their diagnostic
the two parameters has received considerable attention in scheme. Dou and Zhou (2016) applied PNN to diagnose the
recent years and the heuristic optimization algorithms has feature vector which is composed of 6 TDF and 5 FDF. Luo
been approved as the most widely used means of parameters et al. (2015) exploited chemical reaction optimization (CRO)
selection, however, undoubtedly leading to computational algorithm to determine the structural parameters of SVM
burden. along with a feature extraction method via local characteristic-
scale decomposition (LCD) and singular value decomposition
(SVD). In our work, the novel random forest is utilized for
fault diagnosis without complicated procedure of signal pre-
Time consumption analysis processing, prior feature selection or parameters tunning.
Here, the same dataset and input feature vectors in section Random forest can still achieve a high diagnosis accuracy in
4.3 are adopted. The tests were performed on Windows 10 cases where the training set is small, which demonstrates its
Edu (64 bits). ELM and PNN were run by use of MATLAB powerful capacity for pattern recognition problems.
Figure 6. Diagnostic accuracy versus ntry and mtry in random forest.

Han et al. 2691
Figure 7. Diagnostic accuracy versus parameters in each classifier: (a) ELM; (b) PNN; (c) two-dimensional contour of SVM; (d) three-dimensional
curve of SVM.
Table 7. Time consumption in different classifiers (Para.-Parameter tunning).
No. of training Random forest ELM PNN SVM

samples
Model training Prediction Para. and Prediction Para. and Prediction Para. and Prediction
model training model training model training
50 62.64 ms 5.08 ms 3.12 s 0.60 ms 3.01 s 8.44 ms 2.85 s 0.50 ms

20 29.30 ms 4.74 ms 1.33 s 0.64 ms 2.94 s 7.70 ms 1.44 s 0.36 ms
10 18.46 ms 4.64 ms 0.79 s 0.52 ms 2.89 s 7.94 ms 0.94 s 0.18 ms
5 12.92 ms 4.76 ms 0.54 s 0.60 ms 2.53 s 5.46 ms 0.65 s 0.22 ms
Table 8. Comparisons between this work and some published literatures.
References Feature Classifier No. of training No. of testing No. of classes Diagnosis
samples samples accuracy
Yang et al., 2007 TDF-FD SVM 472 472 4 95.253%

Li and Zhang, 2011 SLLEP Minimum-distance classifier 300 300 3 98.33%
Luo et al., 2015 LCD-SVD CRO-SVM 240 80 4 100%
Dou and Zhou, 2016 TDF-FDF PNN 240 80 4 94.38%
Present work TDF Random forest 200 200 4 98.04%
FDF Random forest 40 200 4 99.08%
MSF Random forest 40 200 4 99.24%
Conclusions Dragomiretskiy K and Zosso D (2014) Variational mode decomposi-

tion. IEEE Transactions on Signal Processing 62(3): 531–544.
Currently, ANNs and SVM have received increasingly atten- Freund Y and Schapire RE (1997) A decision-theoretic generalization
tion while random forest is relatively unfamiliar in this field. of on-line learning and an application to boosting. Journal of
It is very necessary to attempt more efficient pattern recogni- Computer and System Sciences 55(1): 119–139.
tion methods to facilitate the intelligent diagnosis of mechani- Guo L, Gao H, Huang H, et al. (2016) Multifeatures fusion and non-
cal failures. In this work, we explored the performances of linear dimension reduction for intelligent bearing condition moni-
random forest, two advanced ANNs (ELM, PNN) and SVM toring. Shock and Vibration (2016): 1–10.
with different features using two datasets from rotating Han T and Jiang D (2016) Rolling bearing fault diagnostic method
based on VMD-AR model and random forest classifier. Shock and
machinery. The comparative analysis demonstrates the excel-
Vibration (2016): 1–11.
lent property of random forest in terms of classification accu- Han T, Jiang D and Wang N (2016) The fault feature extraction of
racy, stability and robustness to features. In particular, in a rolling bearing based on EMD and difference spectrum of singu-
number of diagnostic experiments, we find that random forest lar value. Shock and Vibration: 1–14.
has significant superiority when the training samples are lim- Han T, Jiang D, Zhang X, et al. (2017) Intelligent diagnosis method
ited. Additionally, compared with traditional methods, ran- for rotating machinery using dictionary learning and singular
dom forest performs significantly well in a wider range of value decomposition. Sensors 17(4): 1–18.
SNR. Moreover, its only two parameters usually keep a weak Hansen LK, Liisberg C and Salamon P (1992) Ensemble methods for
sensitivity, which largely solves the drawback of parameter handwritten digit recognition. In: Proceedings of the IEEE work-
shop on neural networks for signal processing II, Copenhagen, Den-
tuning in other pattern recognition algorithms and reduces
mark, pp.333–342.
the time consumption. The overall results indicate that ran-
Ho TK (1998) The random subspace method for constructing deci-
dom forest is satisfactorily able to deal with industrial tasks, sion forests. IEEE Transactions on Pattern Analysis and Machine
leading to a promising application prospect. Intelligence 20(8): 832–844.
Jiang D and Liu C (2011) Machine condition classification using dete-
rioration feature extraction and anomaly determination. IEEE
Declaration of conflicting interest Transactions on Reliability 60(1): 41–48.
The author(s) declared no potential conflicts of interest with Lei Y, He Z and Zi Y (2011) EEMD method and WNN for fault diag-
respect to the research, authorship, and/or publication of this nosis of locomotive roller bearings. Expert Systems with Applica-
article. tions 38(6): 7334–7341.
Li B and Zhang Y (2011) Supervised locally linear embedding projec-
tion (SLLEP) for machinery fault diagnosis. Mechanical Systems
Funding and Signal Processing 25(8): 3125–3134.
Liu C, Jiang DX and Yang WG (2014) Global geometric similarity
This research is supported by the National Natural Science
scheme for feature selection in fault diagnosis. Expert Systems with
Foundation of China (No. 11572167) and the project of State Applications 41(8): 3585–3595.
Key Lab of Power Systems (No. SKLD16Z12). Liu M, Wang M, Wang J, et al. (2013) Comparison of random forest,
support vector machine and back propagation neural network for
electronic tongue data classification: Application to the recogni-
References
tion of orange beverage and Chinese vinegar. Sensors and Actua-
Bogoevska S, Spiridonakos M, Chatzi E, et al. (2017) A data-driven tors B-Chemical 177: 970–980.
diagnostic framework for wind turbine structures: A holistic Lu C, Wang Z-Y, Qin W-L, et al. (2017) Fault diagnosis of rotary
approach. Sensors 17(4): 1–28. machinery components using a stacked denoising autoencoder-
Breiman L (1996) Bagging predictors. Machine Learning 24(2): based health state identification. Signal Processing 130: 377–388.
123–140. Luo S, Cheng J and Ao H (2015) Application of LCD-SVD Tech-
Breiman L (2001) Random forests. Machine Learning 45(1): 5–32. nique and CRO-SVM Method to Fault Diagnosis for Roller Bear-
Chang C-C and Lin C-J (2011) LIBSVM: A library for support vector ing. Shock and Vibration (2015): 1–8.
machines. ACM Transactions on Intelligent Systems and Technol- Luo Y-M, Huang D-T, Liu P-Z, et al. (2016) A novel random forests
ogy 2(3). and its application to the classification of mangroves remote sen-
Cheng G, Chen X-h, Shan X-l, et al. (2016) A new method of gear sing image. Multimedia Tools and Applications 75(16): 9707–9722.
fault diagnosis in strong noise based on multi-sensor information Morizet N, Godin N, Tang J, et al. (2016) Classification of acoustic
fusion. Journal of Vibration and Control 22(6): 1504–1515. emission signals using wavelets and Random Forests: Application
Coomans D, Donald D, Hancock T, et al. (2006) Bagged super wave- to localized corrosion. Mechanical Systems and Signal Processing
lets reduction for boosted prostate cancer classification of seldi-tof 70–71: 1026–1037.
mass spectral serum profiles. Chemometrics and Intelligent Labora- Schwenk H and Bengio Y. (2000) Boosting neural networks. Neural
tory Systems 82(1–2): 2–7. Computation 12(8): 1869–1887.
Dietterich TG (2000) Ensemble methods in machine learning. Lecture Sun P, Liao Y and Lin J (2017) The shock pulse index and its appli-
Notes in Computer Science 1875: 1–15. cation in the fault diagnosis of rolling element bearings. Sensors
Dong Y, Zhang Y, Yue J, et al. (2016) Comparison of random forest, 17(3): 1–23.
random ferns and support vector machine for eye state classifica- Svetnik V, Liaw A, Tong C, et al. (2003) Random forest: A classifica-
tion. Multimedia Tools and Applications 75(19): 11763–11783. tion and regression tool for compound classification and QSAR
Dou D and Zhou S. (2016) Comparison of four direct classification modeling. Journal of Chemical Information and Computer Sciences
methods for intelligent fault diagnosis of rotating machinery. 43(6): 1947–1958.
Applied Soft Computing 46: 459–468.
Han et al. 2693
Tang G, Luo G, Zhang W, et al. (2016) Underdetermined blind organic carbon concentration in an alpine ecosystem. Ecological
source separation with variational mode decomposition for com- Indicators 60: 870–878.
pound roller bearing fault signals. Sensors 16(6): 1–17. Yaqub MF and Loparo KA (2016) An automated approach for bear-
Tian Y, Ma J, Lu C, et al. (2015) Rolling bearing fault diagnosis ing damage detection. Journal of Vibration and Control 22(14):
under variable conditions using LMD-SVD and extreme learning 3253–3266.
machine. Mechanism and Machine Theory 90: 175–186. Yunusa-Kaltungo A, Sinha JK and Nembhard AD (2015) A novel
Xu H and Chen G (2013) An intelligent fault identification method of fault diagnosis technique for enhancing maintenance and reliabil-
rolling bearings based on LSSVM optimized by improved PSO. ity of rotating machines. Structural Health Monitoring 14(6):
Mechanical Systems and Signal Processing 35(1–2): 167–175. 604–621.
Yang J, Zhang Y and Zhu Y (2007) Intelligent fault diagnosis of roll- Zhou B and Cheng Y (2016) Fault diagnosis for rolling bearing under
ing element bearing based on SVMs and fractal dimension. variable conditions based on image recognition. Shock and Vibra-
Mechanical Systems and Signal Processing 21(5): 2012–2024. tion (2016): 1–14.
Yang R-M, Zhang G-L, Liu F, et al. (2016) Comparison of boosted
regression tree and random forest models for mapping topsoil

RF outperforms ANNs and SVMs for intelligent diagnosis of rotating machinery

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

RF outperforms ANNs and SVMs for intelligent diagnosis of rotating machinery

Hochgeladen von

Copyright:

Verfügbare Formate

Article

Transactions of the Institute of

neural networks and support vector Reprints and permissions:

Te Han1, Dongxiang Jiang1, Qi Zhao2, Lei Wang2 and Kai Yin2

Introduction diagnostic techniques, which can realize the automated and

Figure 1. Workflow of random forest algorithm.

Table 1. The statistical characteristic parameters in time-domain.

Number Feature expression Number Feature expression Number Feature expression

Where xðnÞ is a sampling series of raw signal with N points.

Table 2. The statistical characteristic parameters in frequency-domain.

Number Feature expression Number Feature expression Number Feature expression

Table 3. Description of two datasets.

Table 4. Comparison of diagnosis results with different features in dataset A.

Feature No. of training Random forest ELM PNN SVM

TDF 50 98.04 1.09 95.89 1.22 89.82 1.96 93.90 1.68

Table 5. Comparison of diagnosis results with different features in dataset B.

Feature No. of training Random forest ELM PNN SVM

TDF 20 97.64 1.29 94.13 2.55 84.20 3.03 96.48 1.64

Performance comparison under noise environment Parameters discussion in different classifiers

Table 6. Comparison of diagnosis results under different noise environment.

Diagnosis rates (%)

version 8.2. SVM and GA optimization procedure was con-

Figure 6. Diagnostic accuracy versus ntry and mtry in random forest.

Table 7. Time consumption in different classifiers (Para.-Parameter tunning).

No. of training Random forest ELM PNN SVM

50 62.64 ms 5.08 ms 3.12 s 0.60 ms 3.01 s 8.44 ms 2.85 s 0.50 ms

Table 8. Comparisons between this work and some published literatures.

Yang et al., 2007 TDF-FD SVM 472 472 4 95.253%

Conclusions Dragomiretskiy K and Zosso D (2014) Variational mode decomposi-

Das könnte Ihnen auch gefallen