A Classification and Regression Trees (CART) Model of Parallel Structure and Long-Term Prediction Prognosis of Machine Condition

A Classification and Regression Trees (CART)
Model of Parallel Structure and Long-term

Prediction Prognosis of Machine Condition
Van Tung Tran,1 Bo-Suk Yang1,* and Andy Chit Chiow Tan2
1
School of Mechanical Engineering, Pukyong National University, San 100,
Yongdang-dong, Nam-gu, Busan 608-739, South Korea
2
School of Engineering Systems, Queensland University of Technology,
G.P.O. Box 2343, Brisbane, Qld. 4001, Australia
This article presents a combined prediction model involving the parallel of classification and
regression trees (CART) model, namely p-CART, and a long-term direct prediction meth-
odology of time series techniques to predict the future stages of the machines operating
conditions. p-CART model consists of multiple CART models which are connected in par-
allel. Each sub-model in the p-CART is trained independently. Based on the observations,
these sub-models are subsequently used to predict the future values of the machines
operating conditions separately with the same embedding dimension but different
observations indices. Finally, the predicted results of sub-models are combined to produce
the final results of the predicting process. Real trending data acquired from condition mon-
itoring routine of compressor are employed to evaluate the proposed method. A compara-
tive study of the predicted results obtained from traditional CART and p-CART models is
also carried out to appraise the prediction capability of the proposed model.
Keywords machine fault prognosis long-term time series prediction CART direct
prediction methodology
1 Introduction actions on the component before a catastrophic

failure occurs; and (ii) it reduces the costs of unnec-
Prognosis is known to be a key component in essary maintenance, avoids unplanned breakdown
condition-based maintenance (CBM) and has and enables maintenance action to be scheduled
attracted a great deal of research in recent time more effectively. Nevertheless, prognosis has been
due to two following advantages: (i) it increases still a difficult task of CBM.
the safety of machine by detecting the symptoms According to [1], prognosis techniques can be
of failure, so that the system operators can con- broadly classified into three categories: probability-
tinuously monitor and inspect the stage of failure based, model-based, and data-driven based. Among
and take essential and effective maintenance the technique, model-based and data-driven
The Author(s), 2010. Reprints and permissions:

*Author to whom correspondence should be addressed. http://www.sagepub.co.uk/journalsPermissions.nav
E-mail: bsyang@pknu.ac.kr Vol 9(2): 12112
Figures 6 and 7 appear in color online: http://shm.sagepub.com [1475-9217 (201003) 9:2;12112 10.1177/1475921709352148]
121
122 Structural Health Monitoring 9(2)
based are the most considered because they provide average displacement [16], and auto-mutual infor-
higher accuracy and reliability. Model-based prog- mation (AMI) [17] which is used in this study.
nosis techniques are applicable in situations where The next problem in prognostic system is the selec-
accurate mathematical models can be constructed tion of prediction model. Classification and regres-
based on the physical fundamentals of the system; sion tree (CART) [18] has been widely implemented
or the models require available extensive failure in machine fault diagnosis. In the aspect of predic-
data which are either too costly or impossible to tion, CART as well as its extensions has been
be obtained [26]. Even though the accuracy of applied to forecast short-term load of power
these techniques is reasonably high, they are only system [19,20] and operating condition of machine
suitable for specific machine components and each [21] with memorable performance. However, these
component requires a specific mathematical model. researches merely focused on short-term prediction
Alternatively, data-driven prognosis techniques methodology.
require a large amount of historical failure data to This study proposes a combined prediction
build a prognostic model that learns the system model in which multiple CART models, namely
behavior. They frequently use vibration signals for p-CART, are connected in parallel for long-term
temporal pattern identifications since it is relatively prediction purpose. Each sub-model in the
easy to measure and record machine vibration data. p-CART is trained independently. Based on the
Therefore, data-driven prognosis techniques with observations, these sub-models are then used to
vibration-based measurement have been developed predict the future values of the machines operating
in the recent time [711]. conditions separately with the same embedding
In condition prognosis, the prediction model dimension, but different observations indices.
uses available observations to forecast future Finally, the predicted results of sub-models are
operating conditions of the machine [10]. From combined to produce the final results of the pre-
these predicted results, remaining useful life dicting process. The parallel-structure model in
(RUL) of machine can be prognosticated. RUL general and p-CART model in particular has
is the time interval between the current operating many advantages. It can use several observations
condition point and the point where the predicted simultaneously to enhance the prediction accuracy.
values fall within the alarm region or reach the This is not suitable for the traditional CART
predetermined failure threshold. Consequently, because more observations will lead to an expan-
the more the future operating conditions of sion in embedding dimensions that can result in the
machine are accurately predicted, the easier RUL increase of computational complexity. The paral-
is determined. Hence, long-term prediction is lel-structure model has been applied for forecasting
essential for machine condition prognosis even purposes as indicated in references [22,23].
though it is still a difficult and challenging task However, these researches merely addressed the
in time series prediction domain [12]. short-term prediction method.
In long-term prediction, embedding dimension
(ED), time delay (TM), and selection of prediction
2 Background Knowledge
model are essential to be considered. ED and TM
are used to reconstruct the space state of machines
2.1 TM Estimation
condition time series and establish the fundamental
parameters of prediction model. ED is the number There are several methods published in litera-
of initial observations that should be used as the ture could be used to choose the TM. However,
inputs for the prediction model. This value can be most of them are based on empirical concepts and
determined by using the false nearest neighbor is not easy to identify which of the methods is
method (FNN) [13] or the Caos method [14]. suitable for a particular task. In this article, TM
TM is the number of steps that can be predicted is dealt with AMI method. The mutual informa-
by the prediction model to obtain the optimum tion (MI) can be used to evaluate the dependence
performance. It can be calculated by using some of among random variables. The MI between two
the published methods such as auto-correlation [15], variables, let X and Y be the amount of
Tran et al. p-Cart and Long-term Direct Prediction Methodology 123
information obtained from X in the presence of Y B
and vice versa. In time series prediction problem, C

Y
if Y is the output and X is a subset of the input
variables. The MI between X and Y is one crite- A
X
rion for measuring the dependence between inputs 3D
and output. Thus, the input subset X, which
Z
gives maximum MI, is chosen to predict the
output Y. The MI between two measurements D
taken from a single time series x(t) separated by
time is called the AMI. The detailed theory of B
AMI was presented in references [17,24,25]. AMI
A
estimates the degree to which the time series Y C
x(t + ) on average can be predicted from a
given time series x(t), i.e., the mean predictability X
of future values in the time series from the past 2D
values.
The AMI between x(t) and x(t + ) is:
X D
IXX PXX xt, xt
xt,xt
1 1D
PXX xt, xt A DCB
ln
PX xtPX xt
Figure 1 An example of FNNs.
where PX(x(t)) is the normalized histogram of the
distribution of values observed for x(t) and
PXX xt, xt is the joint probability density d + 1, where one can differentiate among points
for the measurements of x(t) and x(t + ). which are true or false neighbor on the orbit.
The decreasing rate of the AMI with increasing For instance in Figure 1, points A, B, C, and D
TM is a normalized measure of the time series belong to a curve. In 1D, points A and D appear to
complexity. The first local minimum of the AMI be the nearest neighbor. However, point D is no
of time series has been used to determine the opti- longer the nearest neighbor of point A in 2D. In
mal TM. the same way, points A and C are the nearest
neighbor in 2D, but they are no longer neighbors
2.2 Determining the ED when viewed in 3D. In this case, points A, D, C are
examples of false neighbors, while points A and B
After calculating the TM, ED is the next are true neighbors.
parameter to be determined. FNN method is The criteria for identification of FNNs can be
employed in this study and will be briefly explained as follows: denote yri d as the nearest
explained. Assuming that a time-series of neighbor of yi(d) in a d dimensional embedding
x1 ,x2 , . . . , xN and vector yi(d), which is given in space. According to [13], the nearest neighbor is
Equation (2), in a delay coordinate embedding of determined by finding the vector which minimizes
the time series with TM and ED d are given. the Euclidean distance:

yi d xi ,xi , . . . , xid1 ,
2 Rd yi d yri d 3
i 1,2, . . . ,N d 1
Considering each of these vectors under a
The observations xi are projections of the sys- d + 1 dimensional embedding:
tems trajectory in the multivariate state space onto
yi d 1 xi ,xi ,xi2 , . . . ,xid ,
1D axis. The FNN method is based on the concept 4
that in the passage from dimension d to dimension i 1,2, . . . ,N d
yri d 1 xri ,xri ,xri2 , . . . ,xrid , of the n(t) response variables y contained in
yt
5
i 1,2, . . . ,N d that terminal node. The split selection at any inter-
nal node t is chosen according to the node impurity
The vectors are separated by the Euclidean that is measured by within-node sum of squares:
distance:
1X 1 X
Rd1 yi d 1 yri d 1 6 Rt 2 , yt
yi yt yi 9
n yi ,xi 2t nt yi ,xi 2t
The first criterion of FNN which identifies a
When a split is performed, two subsets of
FNN is:
observations tL and tR are obtained. The optimum
s
R2d1 R2d xid xrid split s* at node t is obtained from the set of all
4 Rtol 7 splitting candidates S in order that it verifies:
R2d Rd
Rs ,t max Rs,t, s 2 S
where Rtol is a tolerance level. 10
Rs,t Rt RtL RtR
The second criterion is:
Rd1 where R(tL) and R(tR) are sum of squares of the left
4 Atol 8 and right subsets, respectively.
RA
where RA is a measure of the size of the attractor 2.3.2 Tree Pruning The tree gained in tree grow-
and Atol is a threshold that can be chosen in prac- ing process has many terminal nodes that increase
tice. If both Equations (7) and (8) are satisfied, the precision of the responses. However, this is
then yri d is a FNN of yi(d). Once the total
frequently too complicated and over-fitting is
number of FNN is calculated, the percentage of
highly probable. Consequently, it should be
FNN is measured. An appropriate ED is the
pruned back.
value where the percentage of FNN falls to zero.
Tree pruning process is performed by the fol-
lowing procedure:
2.3 Regression Trees
Step 1: At every internal node, an error-
In this study, CART is utilized to build a complexity is found for the number of descendant
regression tree model. Beginning with an entire sub-trees. The error-complexity is defined as:
data set, a binary tree is constructed with the

repeated splits of the subsets into two descendant R T RT T~ 11
subsets according to independent variables. The P P
goal is to produce subsets of the data which are
2 is the total
where RT 1=n t2T~ yi ,xi 2t yi yt
as homogeneous as possible with respect to the within-node sum squares, T~ is the set of current
of
~
response variables. Regression tree in CART is nodes of T and T is the number of terminal nodes
built by using the following two processes: tree in T, a 0 is the complexity parameter which
growing and tree pruning. weights the number of terminal nodes.
Step 2: Using the error-complexity attained in
2.3.1 Tree Growing Let L be a learning data step 1, the internal node with the smallest error is
which comprises n couples of observations replaced by terminal node.
y1 ,x1 , . . . , yn ,xn , where xi x1i , . . . ,xdi is a Step 3: The algorithm terminates if all the
set of independent variables and yi 2 R is a internal nodes have converged to a terminal
response associated with xi. In order to build the node. Otherwise, it returns to step 1.
tree, learning data L is recursively partitioned into
two subsets by binary split until the terminal nods 2.3.3 Cross-Validation for Selecting the Best
are achieved. The result is to move the couples Tree There are two possible methods to select
(y,x) to left or right nodes which contain more the best tree. One is through the use of independent
homogeneous responses. The predicted response test data and the other is cross-validation which is
at each terminal node t is estimated by the mean used in this study. The learning data L is randomly
divided into v approximately equal group, and future values y^ th xt1 ,xt2 , . . . ,xth , the H
(v 1) groups are then utilized as the learning different parallel-structure prediction models are
data for growing the tree model. The remaining used. These models are generated by using training
group is employed as testing data for error estima- set D. The training set D including input vectors
tion of tree model. As a result, v errors are Xi and output vectors Yi is created from the given
obtained by v iterations with variation of the com- observations yt x1 ,x2 , . . . ,xt by using a sliding
binations of the learning data and testing data. The window of length d N + h, where N is the
mean and standard deviation of the errors are number of sub-models in parallel-structure predic-
given: tion model. The vector Xi corresponds to the first
d N value of window whilst the vector Yi is
1X v
RCV d Rts di , the remaining h values of window. The number
v i1 of elements in each vector Xi is d N which is
s 12
CV 1X v
2 used to generate N sub-models of parallel-structure
R d Rts di RCV d models. The training set is structured by synthesiz-
v i1
ing X and Y vectors in the form as shown in
where RCV is the average relative error, d is the Table 1. Thus, by using training set, H parallel-
cross-validation tree, r is the standard error, and structure prediction models are sequentially
Rts is the testing data error. generated with different output Yi which include
The best tree Tt selection is adopted: all the values in the i-th column Y in training set D.

RTt RCV Tmin RCV Tmin 13
4 Architecture of p-CART Model
where R is the cross-validation error and Tmin is
the tree with the smallest cross-validation error. The p-CART model consists of several
sub-models of CART in parallel. Each of these
3 Long-term Direct Prediction Strategy sub-models is independently trained with the
for p-CART Model same output and input vectors. However, not all
the elements of input vectors are used for training
Unlike the short-term prediction (one-step- because the total number of elements in input
ahead prediction), the long-term prediction vectors is d N which is larger than the value
(multi-step-ahead prediction) is typically faced of embedding dimension. Therefore, the indices
with growing uncertainties arising from various of elements are modified corresponding to each
sources such as the accumulation errors and the sub-model and total elements used as inputs for
lack of information. Long-term prediction is each sub-model is equivalent to embedding
divided into three frequently used strategies [26] dimension. For example, the p-CART is used
that involves recursive prediction, direct prediction for forecasting the future values in which the
and DirRec prediction [27]. In this section, the number of sub-model N is 3, while ED and TM
direct prediction strategy applying for p-CART are calculated as 3 and 4, respectively. Thus, the
model is specifically presented. number of elements of input vectors is 9. The sub-
Assuming that a sequence of observation model CART 1 takes the elements xt2 ,xt1 ,xt as
yt xtd1 ,xtd2 , . . . ,xt is given, to predict h input. Similarly, the sub-model CART 2 uses the
Table 1 Training set D for direct prediction strategy
Input X X1 ,X2 , . . . ,XdN Output Y Y1 ,Y2 , . . . ,Yh

x1 ,x2 , . . . ,xdN xdN1 ,xdN2 , . . . ,xdNh
x2 ,x3 , . . . ,xdN1 xdN2 ,xdN3 , . . . ,xdNh1
... ...
xthdN1 ,xthdN2 , . . . ,xth xth1 ,xth2 , . . . ,xt
Output vector Yi from D

yt+h = [Xt +1, Xt +2,...,Xt+h]
Xt8 Xt7 Xt6 Xt5 Xt4 Xt3 Xt2 Xt1 Xt Xt +1 Xt +2 Xt +3 Xt +4
CART 1
Input vector Xi from D

1
[XtdN+1, XtdN+2,...,Xt] CART 2 N
yt + h
CART 3
Figure 2 Architecture and input values for sub-model of p-CART.
elements xt5 ,xt3 ,xt1 and the sub-model CART 3 Validating the prediction models are used for
acquires the elements xt8 ,xt5 ,xt2 as their inputs. measuring their performance capability.
The output vector of each sub-model is the same Step 4 Predicting: long-term direct prediction
vector as xt1 ,xt3 ,xt3 ,xt4 . The architecture method is used to forecast the future values of
and input elements for sub-model is shown in machine condition. The predicted results are mea-
Figure 2. sured by the error between predicted values and
Finally, the predicted values from each sub- actual values in the testing set. Updating models
model are combined to determine the final pre- are also carried out in this procedure for the next
dicted results by using average formula: prediction process.
" #
1X N
1XN
1XN
6 Experiments and Results
y^ th x^ t1 , x^ t2 , . . . , x^ th
N 1 N 1 N 1
14 The proposed method is applied to a real
system to predict the trending data of a low meth-
5 Proposed System ane compressor of a petrochemical plant. The com-
pressor is driven by a 440 kW motor, 6600 V, 2
The proposed system for prognosis comprises poles, and operates at a speed of 3565 rpm.
four procedures sequentially as shown in Figure 3, Other information of the system is summarized in
namely, data acquisition, data splitting, training- Table 2.
validating model, and predicting. The role of The trending data was recorded which
each procedure is explained as follows: included peak acceleration and envelope accelera-
Step 1 Data acquisition: this procedure is used tion data. The average recording duration was
to obtain the vibration data from machine condi- 6 hours during the data acquisition process. Each
tion. It covers a range of data from normal oper- data record consisted of approximately 1200 data
ation to obvious faults of the machine. points as shown in Figures 4 and 5, and contained
Step 2 Data splitting: the trending data information of machine history with respect to
attained from previous procedure is split into two time sequence (vibration amplitude).
parts: training set and testing set. Different data is Consequently, it can be classified as time-series
used for different purposes in the prognosis system. data.
Training set is used for creating the prediction These figures show that the machine was in
models whilst testing set is utilized to test the normal condition during the first 300 points of
trained models. the time sequence. After that time, the condition
Step 3 Training-validating: this procedure of the machine suddenly changed. This indicates
includes the following sub-procedures: estimating possible faults were occurring in the machine. By
the TM and determining the ED based on AMI disassembling and inspecting, these faults were
and FNN method, respectively; creating the pre- identified as the damage of main bearings of the
diction models and validating those models. compressor due to insufficient lubrication.
Trending data of machine
Splitting data
Testing set Training set
Estimate AMI Determine FNN

time delay method Embedding dimension method
Create
p-CART
models
Validate models
No
Good model
Yes
Long-term
Predicting prediction
No
Good results
Yes
Update models
Prognosis
system
Figure 3 Proposed system for machine fault prognosis.
Table 2 Information of the system
Electric motor Compressor

Voltage 6600 V Type Wet screw
Power 440 kW Lobe Male rotor (4 lobes)
Pole 2 Pole Female rotor (6 lobes)
Bearing NDE:#6216, DE:#6216 Bearing Thrust: 7321 BDB
RPM 3565 rpm Radial: Sleeve type
Consequently, the surfaces of these bearings were to train the system. Before being used to generate
overheated and delaminated [21]. the prediction models, the TM was initially calcu-
With the aim of forecasting the change of lated according to the method mentioned in
machine condition, the first 300 points were used Section 2.1. Theoretically, the optimal TM is the
1.4 7.5
Peak acceleration
1.2 7 Envelope acceleration
6.5
1
Acceleration (g)
6
0.8
Bits
5.5
0.6 5
0.4 4.5
4
0.2
3.5
0 5 10 15 20 25 30
0
0 200 400 600 800 1000 1200 Time delay
Time
Figure 6 TM estimation.
Figure 4 The entire peak acceleration data of
compressor.
1
Peak acceleration
0.9
Envelope acceleration
3 0.8
0.7
2.5 Percentage FNN 0.6
0.5
2
Acceleration (g)
0.4
0.3
1.5
0.2
0.1
1
0
1 1.5 2 2.5 3 3.5 4 4.5
0.5 Embedding dimension d
Figure 7 The relationship between FNN percentage

0 and embedding dimension.
0 200 400 600 800 1000 1200
Time
ED for both peak acceleration data and enve-
Figure 5 The entire envelope acceleration data of
lope acceleration data was shown in Figure 7.
compressor.
From the figure, the ED d was chosen as 4
for both dataset where the FNN percentage
value at which the first local minimum of the AMI reaches 0.
is obtained. From Figure 6, the optimal TM of Subsequent to determining the TM and ED,
peak acceleration training data was found as the process of generating the prediction models
7 Bits. Similarly, 5 Bits was the optimal TM value was carried out. Based on those values, the training
of envelope acceleration training data. data was created in which the number of elements
Using FNN method described in Section of input vectors is equal to the product of the ED
2.2, the optimal TM is subsequently utilized to and the number of sub-models; and the number of
determine the ED d. It is noted that the toler- elements of output vectors is equal to the TM.
ance level Rtol and threshold Atol must be ini- Based on the training data, the p-CART model
tially chosen. In this study, Rtol = 15 and was established. It is noted that during the process
Atol = 2 were used according to [13]. The rela- of building each sub-model, the number of
tionship between the FNN percentage and the response values for each terminal node in tree
Table 3 The RMSEs of CART and p-CART
CART p-CART
Data No. of sub-model Training Testing Training Testing

Peak acceleration 1
2 0.002217 0.14809 0.000923 0.14699
3 0.000700 0.14668
4 0.000531 0.14768
5 0.000425 0.14807
Envelope acceleration 1
2 1.3314 105 0.27720 8.534 106 0.25640
3 7.088 106 0.25382
4 6.837 106 0.26252
5 5.42 106 0.26886
growing process was 5 and 10 cross-validations

(a) 0.48
were decided for selecting the best tree in tree prun- RMSE = 0.0022171
ing. Furthermore, in order to evaluate the predict- 0.46
ing performance, the root-mean square error 0.44
(RMSE) was utilized as follows,
Accleration (g)
0.42
s
PN
i1 yi y ^ i 2 0.4
RMSE 15
N 0.38
where N, yi, yi represent the total number of data 0.36

points, the actual value and predicted value of pre- 0.34 Actual
diction model in the training data or testing data, Predicted
0.32
respectively. 0 50 100 150 200 250 300
Choosing the number of sub-models for the Time
p-CART model is another problem to be consid- (b) 0.48

ered in order to obtain the best predicting perfor- RMSE = 0.00070062
0.46
mance. However, it is not easy to identify how
many sub-models should be used and has not 0.44
Acceleration (g)
been extensively reported in literature. Therefore, 0.42

practical approach is a viable solution to deal
0.4
with this problem. In this article, the number of
sub-models was examined from 2 to 5. Then, the 0.38
predicted results were compared to determine how 0.36

many sub-models are appropriate for obtaining
0.34 Actual
the best performance. Table 3 showed a summary Predicted
of the RMSE values of peak acceleration and 0.32
0 50 100 150 200 250 300
envelope acceleration data. From this table, all Time
the RMSEs of the p-CART were smaller than
Figure 8 Training and validating results of peak accel-
those of traditional CART in both cases of peak eration data. (a) CART; (b) p-CART.
acceleration and envelope acceleration, especially
in the training process. The more the sub-models
were, the smaller RMSE value was obtained from value was attained when the number of sub-
the training result. However, this was inappropri- models was 3. After that, the RMSE gradually
ate to the testing process. The smallest RMSE increased. The reason could be related to the
(a) 2.5 (a) 1.4

RMSE = 1.3314e 005 RMSE = 0.14809
1.2
2
1
Accleration (g)
Acceleration (g)
1.5
0.8
1 0.6
0.4
0.5
Actual
Predicted 0.2 Actual
0 Predicted
0 50 100 150 200 250 300 0
Time 0 25 50 75 100
Time
(b) 2.5
RMSE = 7.0878e 006 (b) 1.4
RMSE = 0.14668
2 1.2
Acceleration (g)
Accleration (g)
1.5
0.8
1 0.6
0.4
0.5
Actual
Predicted 0.2 Actual
Predicted
0
0 50 100 150 200 250 300 0
Time 0 25 50 75 100
Time
Figure 9 Training and validating results of envelop
acceleration data. (a) CART; (b) p-CART. Figure 10 Predicted results of peak acceleration data.
(a) CART; (b) p-CART.
expansion in observations due to the increase in The values of RMSE were 0.0007 for acceleration
the number of sub-models. This expansion leads data and 7.088 106 for envelope data, which
to the sub-model using improper observations in were significantly smaller than those of CART.
prediction process. These results indicate that the proposed method
Due to the best performance of p-CART manifests the advances in learning capability.
obtained from 3 sub-models, a comparison of Figures 10 and 11 showed the predicted
tracking capability of the operating condition results in testing process of the CART and
change between this model and traditional model p-CART models for peak acceleration and enve-
was carried out. The training results of the CART lope acceleration. The RMSE values obtained
models for peak acceleration and envelope accel- from p-CART for both data were smaller than
eration data were, respectively, performed and those of CART. Furthermore, in Figure 10, the
shown in Figures 8(a) and 9(a). The actual and p-CART was superior to CART model in keep-
predicted values were almost identical with very ing track of the changes of the machine operat-
small RMSE values of 0.002217 and 1.314 105 ing condition. This indicated that there was an
for acceleration data and envelope data, respec- improvement in the predicting ability of p-CART
tively. Similarly, the training results of p-CART model when compared with that of traditional
are depicted in Figures 8(b) and 9(b). model.
3 capability of p-CART. This improvement is imple-

RMSE = 0.2772
mented by multiplying the predicted result of each
2.5
sub-model by the weight factor when calculating the
Acceleration (g)
final predicted results. These weight factors are

2
determined based on the errors during training pro-
cess. The sub-model which has the most approxi-
1.5
mate predicted values to the actual values will be
1
assigned with the largest weight factor, while the
Actual
Predicted
others will be assigned with descendent weight fac-
0.5 tors. The final predicted result will be the sum of
0 25 50 75 100
these products. This improvement proposed here is
Time
mentioned as a future work.
3
RMSE = 0.25382
2.5
References
1. Vachtsevanos, G., Lewis, F., Roemer, M., Hess, A. and

Accleration (g)
2 Wu, B. (2006). Intelligent Fault Diagnosis and Prognosis

for Engineering System, New York: Wiley.
1.5
2. Tu, F., Ghoshal, S., Luo, J., Biswas, G., Mahadevan, S.,
Jaw, L. and Navarra, K. (2007). PHM integration with
maintenance and inventory management systems. In:
1 Proceedings of Aerospace Conference 2007 IEEE, pp. 112.
Actual
Predicted 3. Abbas, M., Ferri, A.A., Orchard, M.E. and Vachtsevanos,
0.5 G.J. (2007). An intelligent diagnostic/ prognostic frame-
0 25 50 75 100 work for automotive electrical systems. In: Proceedings of
Time
Intelligent Vehicles Symposium 2007 IEEE, pp. 352357.
Figure 11 Predicted results of envelop acceleration 4. Watson, M., Byington, C., Edwards, D. and Amin, S.
data. (a) CART; (b) p-CART. (2005). Dynamic modeling and wear-based remaining
useful life prediction of high power clutch systems.
Tribology Transactions, 48, 208217.
7 Conclusions 5. Luo, M., Wang, D., Pham, M., Low, C.B., Zhang, J.B.,
Zhang, D.H. and Zhao, Y.Z. (2005). Model-based fault
diagnosis/prognosis for wheeled mobile robots: a review.
This article presents a data-driven approach
In: Proceeding of 32nd Annual Conference of IEEE,
which combines the prediction model involving par- Industrial Electronics Society, pp. 22672272.
allel of CART and long-term direct prediction meth- 6. Li, Y., Kurfess, T.R. and Liang, S.Y. (2000). Stochastic
odology for forecasting the operating conditions of prognostics for rolling element bearings. Mechanical
machine. The p-CART model is validated for its Systems and Signal Processing, 14, 747762.
ability to predict future state conditions of a low 7. Schwabacher, M. and Goebel, K. (2007). An survey of
methane compressor using peak acceleration and artificial intelligence for prognostics. In: Proceedings of
envelope acceleration data. The predicted results AAAI Fall Symposium on Artificial Intelligence for
of the p-CART are also compared with those of Prognostics, Menlo Park, California, 911 November.
traditional CART. From the predicted results, the 8. Vachtsevanos, G. and Wang, P. (2001). Fault prognosis
using dynamic wavelet neural networks. In:
p-CART model performance is superior to the tra-
AUTOTESTCON Proceedings, IEEE Systems
ditional model. Consequently, this proposed method
Readiness Technology Conference, pp. 857870.
is a new way to improve the performance of long- 9. Huang, R., Xi, L., Li, X., Liu, C.R., Qiu, H. and Lee, J.
term prediction which has been a difficult and a (2007). Residual life prediction for ball bearings based on
challenging task in forecasting the machines operat- self-organizing map and back propagation neural net-
ing conditions. Furthermore, this study commences work methods. Mechanical Systems and Signal
the new feasibility to further improve the predicting Processing, 21, 193207.
10. Wang, W.Q., Golnaraghi, M.F. and Ismail, F. (2004). 20. Moti, H., Kosemura, N., Ishiguro, K. and Kondo, T.
Prognosis of machine health condition using neuro- (2001). Short-term load forecasting with fuzzy regres-
fuzzy system. Mechanical System and Signal sion tree in power systems. System Man and
Processing, 18, 813831. Cybernetics, IEEE International Conference, Vol. 3,
11. Brown, E.R., McCollom, N.N., Moore, E. and Hess, A. Tucson, Arizona, pp. 19481953.
(2007). Prognostics and health management a data-driven 21. Tran, V.T., Yang, B.S., Oh, M.S. and Tan, A.C.C.
approach to supporting the F-35 Lightning II. In: (2008). Machine condition prognosis based on regres-
Proceedings of Aerospace Conference 2007 IEEE, pp. 112. sion trees and one-step-ahead prediction. Mechanical
12. Ji, Y., Hao, J., Reyhani, N. and Lendasse, A. (2005). Systems and Signal Processing, 22, 11791193.
Direct and recursive prediction of time series using 22. Hu, C. and Cao, L. (2004). ANN based load forecast-
mutual information selection. Lecture Notes in ing: a parallel structure. Systems, Man and Cybernetics,
Computer Science, 3512, 10101017. IEEE International Conference, The Hague,
13. Kennel, M.B., Brown, R. and Abarbanel, H.D.I. Netherlands, pp. 35943598.
(1992). Determining embedding dimension for phase- 23. Kim, M.S. and Chung, C.S. (2005). Sunspot time series
space reconstruction using a geometrical construction. prediction using parallel-structure fuzzy system. Lecture
Physical Review A, 45, 34033411. Notes in Computer Science, 3614, 731741.
14. Cao, L. (1997). Practical method for determining the 24. Jeong, J., Gore, J.C. and Peterson, B.S. (2001). Mutual
minimum embedding dimension of a scalar time information analysis of the EEG in patients with
series. Physica D, 110, 4350. Alzheimers disease. Clinical Neurophysiology, 112,
15. Broomhead, D.S. (1986). Extracting qualitative 827835.
dynamics from experimental data. Physica D, 20, 217. 25. Sorjamaa, A., Hao, J., Reyhani, N., Ji, Y. and
16. Rosenstein, M.T., Collins, J.J. and Luca, C.J.D. (1994). Lendasse, A. (2007). Methodology for long-term
Reconstruction expansion as a geometry-based frame- prediction of time series. Neurocomputing, 70,
work for choosing proper delay time. Physica D, 73, 28612869.
8289. 26. Sorjamaa, A. and Lendasse, A. (2007). Time series pre-
17. Fraser, A.M. and Swinney, H.L. (1986). Independent diction as a problem of missing values: application to
coordinates for strange attractors from mutual infor- ESTSP and NN3 competition benchmarks. In:
mation. Physical Review A, 33, 1134. Proceedings of European Symposium on Time Series
18. Breiman, L., Friedman, J.H., Olshen, R.A. and Stone, Prediction, Helsinki, Finland, pp. 165174.
C.J. (1984). Classification and Regression Trees, 27. Tran, V.T., Yang, B.S. and Tan, A.C.C. (2009).
Chapman & Hall/CRC, Belmont, California. Multi-step ahead direct prediction for the machine
19. Yang, J. and Stenzel, J. (2006). Short-term load fore- condition prognosis using regression trees and neuro-
casting with increment regression tree. Electric Power fuzzy systems. Expert Systems with Applications, 36,
Systems Research, 76, 880888. 93789387.

A Classification and Regression Trees (CART) Model of Parallel Structure and Long-Term Prediction Prognosis of Machine Condition

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

A Classification and Regression Trees (CART) Model of Parallel Structure and Long-Term Prediction Prognosis of Machine Condition

Hochgeladen von

Copyright:

Verfügbare Formate

A Classification and Regression Trees (CART)

Model of Parallel Structure and Long-term

1 Introduction actions on the component before a catastrophic

The Author(s), 2010. Reprints and permissions:

information obtained from X in the presence of Y B

and vice versa. In time series prediction problem, C

Table 1 Training set D for direct prediction strategy

Input X X1 ,X2 , . . . ,XdN Output Y Y1 ,Y2 , . . . ,Yh

Output vector Yi from D

Input vector Xi from D

Figure 2 Architecture and input values for sub-model of p-CART.

Trending data of machine

Testing set Training set

Estimate AMI Determine FNN

Figure 3 Proposed system for machine fault prognosis.

Table 2 Information of the system

Electric motor Compressor

Figure 7 The relationship between FNN percentage

Table 3 The RMSEs of CART and p-CART

Data No. of sub-model Training Testing Training Testing

growing process was 5 and 10 cross-validations

where N, yi, yi represent the total number of data 0.36

p-CART model is another problem to be consid- (b) 0.48

been extensively reported in literature. Therefore, 0.42

predicted results were compared to determine how 0.36

(a) 2.5 (a) 1.4

3 capability of p-CART. This improvement is imple-

final predicted results. These weight factors are

1. Vachtsevanos, G., Lewis, F., Roemer, M., Hess, A. and

2 Wu, B. (2006). Intelligent Fault Diagnosis and Prognosis

Das könnte Ihnen auch gefallen

Input X X1 ,X2 , . . . ,XdN Output Y Y1 ,Y2 , . . . ,Yh