Beruflich Dokumente
Kultur Dokumente
Van Tung Tran,1 Bo-Suk Yang1,* and Andy Chit Chiow Tan2
1
School of Mechanical Engineering, Pukyong National University, San 100,
Yongdang-dong, Nam-gu, Busan 608-739, South Korea
2
School of Engineering Systems, Queensland University of Technology,
G.P.O. Box 2343, Brisbane, Qld. 4001, Australia
This article presents a combined prediction model involving the parallel of classification and
regression trees (CART) model, namely p-CART, and a long-term direct prediction meth-
odology of time series techniques to predict the future stages of the machines operating
conditions. p-CART model consists of multiple CART models which are connected in par-
allel. Each sub-model in the p-CART is trained independently. Based on the observations,
these sub-models are subsequently used to predict the future values of the machines
operating conditions separately with the same embedding dimension but different
observations indices. Finally, the predicted results of sub-models are combined to produce
the final results of the predicting process. Real trending data acquired from condition mon-
itoring routine of compressor are employed to evaluate the proposed method. A compara-
tive study of the predicted results obtained from traditional CART and p-CART models is
also carried out to appraise the prediction capability of the proposed model.
Keywords machine fault prognosis long-term time series prediction CART direct
prediction methodology
121
122 Structural Health Monitoring 9(2)
based are the most considered because they provide average displacement [16], and auto-mutual infor-
higher accuracy and reliability. Model-based prog- mation (AMI) [17] which is used in this study.
nosis techniques are applicable in situations where The next problem in prognostic system is the selec-
accurate mathematical models can be constructed tion of prediction model. Classification and regres-
based on the physical fundamentals of the system; sion tree (CART) [18] has been widely implemented
or the models require available extensive failure in machine fault diagnosis. In the aspect of predic-
data which are either too costly or impossible to tion, CART as well as its extensions has been
be obtained [26]. Even though the accuracy of applied to forecast short-term load of power
these techniques is reasonably high, they are only system [19,20] and operating condition of machine
suitable for specific machine components and each [21] with memorable performance. However, these
component requires a specific mathematical model. researches merely focused on short-term prediction
Alternatively, data-driven prognosis techniques methodology.
require a large amount of historical failure data to This study proposes a combined prediction
build a prognostic model that learns the system model in which multiple CART models, namely
behavior. They frequently use vibration signals for p-CART, are connected in parallel for long-term
temporal pattern identifications since it is relatively prediction purpose. Each sub-model in the
easy to measure and record machine vibration data. p-CART is trained independently. Based on the
Therefore, data-driven prognosis techniques with observations, these sub-models are then used to
vibration-based measurement have been developed predict the future values of the machines operating
in the recent time [711]. conditions separately with the same embedding
In condition prognosis, the prediction model dimension, but different observations indices.
uses available observations to forecast future Finally, the predicted results of sub-models are
operating conditions of the machine [10]. From combined to produce the final results of the pre-
these predicted results, remaining useful life dicting process. The parallel-structure model in
(RUL) of machine can be prognosticated. RUL general and p-CART model in particular has
is the time interval between the current operating many advantages. It can use several observations
condition point and the point where the predicted simultaneously to enhance the prediction accuracy.
values fall within the alarm region or reach the This is not suitable for the traditional CART
predetermined failure threshold. Consequently, because more observations will lead to an expan-
the more the future operating conditions of sion in embedding dimensions that can result in the
machine are accurately predicted, the easier RUL increase of computational complexity. The paral-
is determined. Hence, long-term prediction is lel-structure model has been applied for forecasting
essential for machine condition prognosis even purposes as indicated in references [22,23].
though it is still a difficult and challenging task However, these researches merely addressed the
in time series prediction domain [12]. short-term prediction method.
In long-term prediction, embedding dimension
(ED), time delay (TM), and selection of prediction
2 Background Knowledge
model are essential to be considered. ED and TM
are used to reconstruct the space state of machines
2.1 TM Estimation
condition time series and establish the fundamental
parameters of prediction model. ED is the number There are several methods published in litera-
of initial observations that should be used as the ture could be used to choose the TM. However,
inputs for the prediction model. This value can be most of them are based on empirical concepts and
determined by using the false nearest neighbor is not easy to identify which of the methods is
method (FNN) [13] or the Caos method [14]. suitable for a particular task. In this article, TM
TM is the number of steps that can be predicted is dealt with AMI method. The mutual informa-
by the prediction model to obtain the optimum tion (MI) can be used to evaluate the dependence
performance. It can be calculated by using some of among random variables. The MI between two
the published methods such as auto-correlation [15], variables, let X and Y be the amount of
Tran et al. p-Cart and Long-term Direct Prediction Methodology 123
values.
The AMI between x(t) and x(t + ) is:
X D
IXX PXX xt, xt
xt,xt
1 1D
PXX xt, xt A DCB
ln
PX xtPX xt
Figure 1 An example of FNNs.
where PX(x(t)) is the normalized histogram of the
distribution of values observed for x(t) and
PXX xt, xt is the joint probability density d + 1, where one can differentiate among points
for the measurements of x(t) and x(t + ). which are true or false neighbor on the orbit.
The decreasing rate of the AMI with increasing For instance in Figure 1, points A, B, C, and D
TM is a normalized measure of the time series belong to a curve. In 1D, points A and D appear to
complexity. The first local minimum of the AMI be the nearest neighbor. However, point D is no
of time series has been used to determine the opti- longer the nearest neighbor of point A in 2D. In
mal TM. the same way, points A and C are the nearest
neighbor in 2D, but they are no longer neighbors
2.2 Determining the ED when viewed in 3D. In this case, points A, D, C are
examples of false neighbors, while points A and B
After calculating the TM, ED is the next are true neighbors.
parameter to be determined. FNN method is The criteria for identification of FNNs can be
employed in this study and will be briefly explained as follows: denote yri d as the nearest
explained. Assuming that a time-series of neighbor of yi(d) in a d dimensional embedding
x1 ,x2 , . . . , xN and vector yi(d), which is given in space. According to [13], the nearest neighbor is
Equation (2), in a delay coordinate embedding of determined by finding the vector which minimizes
the time series with TM and ED d are given. the Euclidean distance:
yi d xi ,xi , . . . , xid1 ,
2 Rd yi d yri d 3
i 1,2, . . . ,N d 1
Considering each of these vectors under a
The observations xi are projections of the sys- d + 1 dimensional embedding:
tems trajectory in the multivariate state space onto
yi d 1 xi ,xi ,xi2 , . . . ,xid ,
1D axis. The FNN method is based on the concept 4
that in the passage from dimension d to dimension i 1,2, . . . ,N d
124 Structural Health Monitoring 9(2)
yri d 1 xri ,xri ,xri2 , . . . ,xrid , of the n(t) response variables y contained in
yt
5
i 1,2, . . . ,N d that terminal node. The split selection at any inter-
nal node t is chosen according to the node impurity
The vectors are separated by the Euclidean that is measured by within-node sum of squares:
distance:
1X 1 X
Rd1 yi d 1 yri d 1 6 Rt 2 , yt
yi yt yi 9
n yi ,xi 2t nt yi ,xi 2t
The first criterion of FNN which identifies a
When a split is performed, two subsets of
FNN is:
observations tL and tR are obtained. The optimum
s
R2d1 R2d xid xrid split s* at node t is obtained from the set of all
4 Rtol 7 splitting candidates S in order that it verifies:
R2d Rd
Rs ,t max Rs,t, s 2 S
where Rtol is a tolerance level. 10
Rs,t Rt RtL RtR
The second criterion is:
Rd1 where R(tL) and R(tR) are sum of squares of the left
4 Atol 8 and right subsets, respectively.
RA
where RA is a measure of the size of the attractor 2.3.2 Tree Pruning The tree gained in tree grow-
and Atol is a threshold that can be chosen in prac- ing process has many terminal nodes that increase
tice. If both Equations (7) and (8) are satisfied, the precision of the responses. However, this is
then yri d is a FNN of yi(d). Once the total
frequently too complicated and over-fitting is
number of FNN is calculated, the percentage of
highly probable. Consequently, it should be
FNN is measured. An appropriate ED is the
pruned back.
value where the percentage of FNN falls to zero.
Tree pruning process is performed by the fol-
lowing procedure:
2.3 Regression Trees
Step 1: At every internal node, an error-
In this study, CART is utilized to build a complexity is found for the number of descendant
regression tree model. Beginning with an entire sub-trees. The error-complexity is defined as:
data set, a binary tree is constructed with the
repeated splits of the subsets into two descendant R T RT T~ 11
subsets according to independent variables. The P P
goal is to produce subsets of the data which are
2 is the total
where RT 1=n t2T~ yi ,xi 2t yi yt
as homogeneous as possible with respect to the within-node sum squares, T~ is the set of current
of
~
response variables. Regression tree in CART is nodes of T and T is the number of terminal nodes
built by using the following two processes: tree in T, a 0 is the complexity parameter which
growing and tree pruning. weights the number of terminal nodes.
Step 2: Using the error-complexity attained in
2.3.1 Tree Growing Let L be a learning data step 1, the internal node with the smallest error is
which comprises n couples of observations replaced by terminal node.
y1 ,x1 , . . . , yn ,xn , where xi x1i , . . . ,xdi is a Step 3: The algorithm terminates if all the
set of independent variables and yi 2 R is a internal nodes have converged to a terminal
response associated with xi. In order to build the node. Otherwise, it returns to step 1.
tree, learning data L is recursively partitioned into
two subsets by binary split until the terminal nods 2.3.3 Cross-Validation for Selecting the Best
are achieved. The result is to move the couples Tree There are two possible methods to select
(y,x) to left or right nodes which contain more the best tree. One is through the use of independent
homogeneous responses. The predicted response test data and the other is cross-validation which is
at each terminal node t is estimated by the mean used in this study. The learning data L is randomly
Tran et al. p-Cart and Long-term Direct Prediction Methodology 125
divided into v approximately equal group, and future values y^ th xt1 ,xt2 , . . . ,xth , the H
(v 1) groups are then utilized as the learning different parallel-structure prediction models are
data for growing the tree model. The remaining used. These models are generated by using training
group is employed as testing data for error estima- set D. The training set D including input vectors
tion of tree model. As a result, v errors are Xi and output vectors Yi is created from the given
obtained by v iterations with variation of the com- observations yt x1 ,x2 , . . . ,xt by using a sliding
binations of the learning data and testing data. The window of length d N + h, where N is the
mean and standard deviation of the errors are number of sub-models in parallel-structure predic-
given: tion model. The vector Xi corresponds to the first
d N value of window whilst the vector Yi is
1X v
RCV d Rts di , the remaining h values of window. The number
v i1 of elements in each vector Xi is d N which is
s 12
CV 1X v
2 used to generate N sub-models of parallel-structure
R d Rts di RCV d models. The training set is structured by synthesiz-
v i1
ing X and Y vectors in the form as shown in
where RCV is the average relative error, d is the Table 1. Thus, by using training set, H parallel-
cross-validation tree, r is the standard error, and structure prediction models are sequentially
Rts is the testing data error. generated with different output Yi which include
The best tree Tt selection is adopted: all the values in the i-th column Y in training set D.
RTt RCV Tmin RCV Tmin 13
4 Architecture of p-CART Model
where R is the cross-validation error and Tmin is
the tree with the smallest cross-validation error. The p-CART model consists of several
sub-models of CART in parallel. Each of these
3 Long-term Direct Prediction Strategy sub-models is independently trained with the
for p-CART Model same output and input vectors. However, not all
the elements of input vectors are used for training
Unlike the short-term prediction (one-step- because the total number of elements in input
ahead prediction), the long-term prediction vectors is d N which is larger than the value
(multi-step-ahead prediction) is typically faced of embedding dimension. Therefore, the indices
with growing uncertainties arising from various of elements are modified corresponding to each
sources such as the accumulation errors and the sub-model and total elements used as inputs for
lack of information. Long-term prediction is each sub-model is equivalent to embedding
divided into three frequently used strategies [26] dimension. For example, the p-CART is used
that involves recursive prediction, direct prediction for forecasting the future values in which the
and DirRec prediction [27]. In this section, the number of sub-model N is 3, while ED and TM
direct prediction strategy applying for p-CART are calculated as 3 and 4, respectively. Thus, the
model is specifically presented. number of elements of input vectors is 9. The sub-
Assuming that a sequence of observation model CART 1 takes the elements xt2 ,xt1 ,xt as
yt xtd1 ,xtd2 , . . . ,xt is given, to predict h input. Similarly, the sub-model CART 2 uses the
CART 1
CART 3
elements xt5 ,xt3 ,xt1 and the sub-model CART 3 Validating the prediction models are used for
acquires the elements xt8 ,xt5 ,xt2 as their inputs. measuring their performance capability.
The output vector of each sub-model is the same Step 4 Predicting: long-term direct prediction
vector as xt1 ,xt3 ,xt3 ,xt4 . The architecture method is used to forecast the future values of
and input elements for sub-model is shown in machine condition. The predicted results are mea-
Figure 2. sured by the error between predicted values and
Finally, the predicted values from each sub- actual values in the testing set. Updating models
model are combined to determine the final pre- are also carried out in this procedure for the next
dicted results by using average formula: prediction process.
" #
1X N
1XN
1XN
6 Experiments and Results
y^ th x^ t1 , x^ t2 , . . . , x^ th
N 1 N 1 N 1
14 The proposed method is applied to a real
system to predict the trending data of a low meth-
5 Proposed System ane compressor of a petrochemical plant. The com-
pressor is driven by a 440 kW motor, 6600 V, 2
The proposed system for prognosis comprises poles, and operates at a speed of 3565 rpm.
four procedures sequentially as shown in Figure 3, Other information of the system is summarized in
namely, data acquisition, data splitting, training- Table 2.
validating model, and predicting. The role of The trending data was recorded which
each procedure is explained as follows: included peak acceleration and envelope accelera-
Step 1 Data acquisition: this procedure is used tion data. The average recording duration was
to obtain the vibration data from machine condi- 6 hours during the data acquisition process. Each
tion. It covers a range of data from normal oper- data record consisted of approximately 1200 data
ation to obvious faults of the machine. points as shown in Figures 4 and 5, and contained
Step 2 Data splitting: the trending data information of machine history with respect to
attained from previous procedure is split into two time sequence (vibration amplitude).
parts: training set and testing set. Different data is Consequently, it can be classified as time-series
used for different purposes in the prognosis system. data.
Training set is used for creating the prediction These figures show that the machine was in
models whilst testing set is utilized to test the normal condition during the first 300 points of
trained models. the time sequence. After that time, the condition
Step 3 Training-validating: this procedure of the machine suddenly changed. This indicates
includes the following sub-procedures: estimating possible faults were occurring in the machine. By
the TM and determining the ED based on AMI disassembling and inspecting, these faults were
and FNN method, respectively; creating the pre- identified as the damage of main bearings of the
diction models and validating those models. compressor due to insufficient lubrication.
Tran et al. p-Cart and Long-term Direct Prediction Methodology 127
Splitting data
Create
p-CART
models
Validate models
No
Good model
Yes
Long-term
Predicting prediction
No
Good results
Yes
Update models
Prognosis
system
Consequently, the surfaces of these bearings were to train the system. Before being used to generate
overheated and delaminated [21]. the prediction models, the TM was initially calcu-
With the aim of forecasting the change of lated according to the method mentioned in
machine condition, the first 300 points were used Section 2.1. Theoretically, the optimal TM is the
128 Structural Health Monitoring 9(2)
1.4 7.5
Peak acceleration
1.2 7 Envelope acceleration
6.5
1
Acceleration (g)
6
0.8
Bits
5.5
0.6 5
0.4 4.5
4
0.2
3.5
0 5 10 15 20 25 30
0
0 200 400 600 800 1000 1200 Time delay
Time
Figure 6 TM estimation.
Figure 4 The entire peak acceleration data of
compressor.
1
Peak acceleration
0.9
Envelope acceleration
3 0.8
0.7
2.5 Percentage FNN 0.6
0.5
2
Acceleration (g)
0.4
0.3
1.5
0.2
0.1
1
0
1 1.5 2 2.5 3 3.5 4 4.5
0.5 Embedding dimension d
CART p-CART
0.42
s
PN
i1 yi y ^ i 2 0.4
RMSE 15
N 0.38
Acceleration (g)
1.5
0.8
1 0.6
0.4
0.5
Actual
Predicted 0.2 Actual
0 Predicted
0 50 100 150 200 250 300 0
Time 0 25 50 75 100
Time
(b) 2.5
RMSE = 7.0878e 006 (b) 1.4
RMSE = 0.14668
2 1.2
Acceleration (g)
Accleration (g)
1.5
0.8
1 0.6
0.4
0.5
Actual
Predicted 0.2 Actual
Predicted
0
0 50 100 150 200 250 300 0
Time 0 25 50 75 100
Time
Figure 9 Training and validating results of envelop
acceleration data. (a) CART; (b) p-CART. Figure 10 Predicted results of peak acceleration data.
(a) CART; (b) p-CART.
expansion in observations due to the increase in The values of RMSE were 0.0007 for acceleration
the number of sub-models. This expansion leads data and 7.088 106 for envelope data, which
to the sub-model using improper observations in were significantly smaller than those of CART.
prediction process. These results indicate that the proposed method
Due to the best performance of p-CART manifests the advances in learning capability.
obtained from 3 sub-models, a comparison of Figures 10 and 11 showed the predicted
tracking capability of the operating condition results in testing process of the CART and
change between this model and traditional model p-CART models for peak acceleration and enve-
was carried out. The training results of the CART lope acceleration. The RMSE values obtained
models for peak acceleration and envelope accel- from p-CART for both data were smaller than
eration data were, respectively, performed and those of CART. Furthermore, in Figure 10, the
shown in Figures 8(a) and 9(a). The actual and p-CART was superior to CART model in keep-
predicted values were almost identical with very ing track of the changes of the machine operat-
small RMSE values of 0.002217 and 1.314 105 ing condition. This indicated that there was an
for acceleration data and envelope data, respec- improvement in the predicting ability of p-CART
tively. Similarly, the training results of p-CART model when compared with that of traditional
are depicted in Figures 8(b) and 9(b). model.
Tran et al. p-Cart and Long-term Direct Prediction Methodology 131
2.5
References
10. Wang, W.Q., Golnaraghi, M.F. and Ismail, F. (2004). 20. Moti, H., Kosemura, N., Ishiguro, K. and Kondo, T.
Prognosis of machine health condition using neuro- (2001). Short-term load forecasting with fuzzy regres-
fuzzy system. Mechanical System and Signal sion tree in power systems. System Man and
Processing, 18, 813831. Cybernetics, IEEE International Conference, Vol. 3,
11. Brown, E.R., McCollom, N.N., Moore, E. and Hess, A. Tucson, Arizona, pp. 19481953.
(2007). Prognostics and health management a data-driven 21. Tran, V.T., Yang, B.S., Oh, M.S. and Tan, A.C.C.
approach to supporting the F-35 Lightning II. In: (2008). Machine condition prognosis based on regres-
Proceedings of Aerospace Conference 2007 IEEE, pp. 112. sion trees and one-step-ahead prediction. Mechanical
12. Ji, Y., Hao, J., Reyhani, N. and Lendasse, A. (2005). Systems and Signal Processing, 22, 11791193.
Direct and recursive prediction of time series using 22. Hu, C. and Cao, L. (2004). ANN based load forecast-
mutual information selection. Lecture Notes in ing: a parallel structure. Systems, Man and Cybernetics,
Computer Science, 3512, 10101017. IEEE International Conference, The Hague,
13. Kennel, M.B., Brown, R. and Abarbanel, H.D.I. Netherlands, pp. 35943598.
(1992). Determining embedding dimension for phase- 23. Kim, M.S. and Chung, C.S. (2005). Sunspot time series
space reconstruction using a geometrical construction. prediction using parallel-structure fuzzy system. Lecture
Physical Review A, 45, 34033411. Notes in Computer Science, 3614, 731741.
14. Cao, L. (1997). Practical method for determining the 24. Jeong, J., Gore, J.C. and Peterson, B.S. (2001). Mutual
minimum embedding dimension of a scalar time information analysis of the EEG in patients with
series. Physica D, 110, 4350. Alzheimers disease. Clinical Neurophysiology, 112,
15. Broomhead, D.S. (1986). Extracting qualitative 827835.
dynamics from experimental data. Physica D, 20, 217. 25. Sorjamaa, A., Hao, J., Reyhani, N., Ji, Y. and
16. Rosenstein, M.T., Collins, J.J. and Luca, C.J.D. (1994). Lendasse, A. (2007). Methodology for long-term
Reconstruction expansion as a geometry-based frame- prediction of time series. Neurocomputing, 70,
work for choosing proper delay time. Physica D, 73, 28612869.
8289. 26. Sorjamaa, A. and Lendasse, A. (2007). Time series pre-
17. Fraser, A.M. and Swinney, H.L. (1986). Independent diction as a problem of missing values: application to
coordinates for strange attractors from mutual infor- ESTSP and NN3 competition benchmarks. In:
mation. Physical Review A, 33, 1134. Proceedings of European Symposium on Time Series
18. Breiman, L., Friedman, J.H., Olshen, R.A. and Stone, Prediction, Helsinki, Finland, pp. 165174.
C.J. (1984). Classification and Regression Trees, 27. Tran, V.T., Yang, B.S. and Tan, A.C.C. (2009).
Chapman & Hall/CRC, Belmont, California. Multi-step ahead direct prediction for the machine
19. Yang, J. and Stenzel, J. (2006). Short-term load fore- condition prognosis using regression trees and neuro-
casting with increment regression tree. Electric Power fuzzy systems. Expert Systems with Applications, 36,
Systems Research, 76, 880888. 93789387.