Beruflich Dokumente
Kultur Dokumente
ARTICLE IN PRESS
Reliability Engineering and System Safety 94 (2009) 16181628
a r t i c l e in f o
a b s t r a c t
Article history:
Received 24 April 2008
Received in revised form
6 April 2009
Accepted 10 April 2009
Available online 18 April 2009
Many times, reliability studies rely on false premises such as independent and identically distributed
time between failures assumption (renewal process). This can lead to erroneous model selection for the
time to failure of a particular component or system, which can in turn lead to wrong conclusions and
decisions. A strong statistical focus, a lack of a systematic approach and sometimes inadequate
theoretical background seem to have made it difcult for maintenance analysts to adopt the necessary
stage of data testing before the selection of a suitable model. In this paper, a framework for model
selection to represent the failure process for a component or system is presented, based on a review of
available trend tests. The paper focuses only on single-time-variable models and is primarily directed to
analysts responsible for reliability analyses in an industrial maintenance environment. The model
selection framework is directed towards the discrimination between the use of statistical distributions
to represent the time to failure (renewal approach); and the use of stochastic point processes
(repairable systems approach), when there may be the presence of system ageing or reliability
growth. An illustrative example based on failure data from a eet of backhoes is included.
& 2009 Elsevier Ltd. All rights reserved.
Keywords:
Trend testing
Time to failure
Model selection
Repairable systems
NHPP
1. Introduction
As described by Dekker and Scarf [1] maintenance optimization consists of mathematical models aimed at nding balances
between costs and benets of maintenance, or the most appropriate moment to execute maintenance. Many times, these
models are fairly complex and maintenance analysts have been
slow to apply them, since often data are scarce or, due to lack of
statistical theoretical knowledge, models are very difcult to
implement correctly in an industrial setting. Other, more
qualitative techniques such as reliability centered maintenance
(RCM) or total productive maintenance (TPM) have then played an
important role in maintenance optimization. Nevertheless, data
analysis and statistical modeling are denitely very valuable tools
engineers can employ to optimize the maintenance of assets
under their supervision.
Acknowledging that many reliability studies or maintenance
optimization programs do not require sophisticated statistical
inputs, Ansell and Phillips [2] reinforce that even at a basic level,
we should always be critical of the analysis and ask whether a
technique is appropriate.
Corresponding author.
Objectives
Maintenance Data
Model Selection
Failure Process
Optimization Model
Solution
1619
tool for the objective the engineers assign to it. Actually, the
logical priority is that of objective, data and, nally, model
selection (as shown in Fig. 1). In other words, as suggested by
Ansell and Phillips [4] an analysis should be problem led rather
than technique or model centered. Nevertheless, correct
assessment of the failure process and of time to failure is
usually of critical importance to the (posterior) economic
analysis required to nd an optimal solution to the problem that
originated the analysis. Discussion of approaches to maintenance
and reliability optimization and models mixing reliability and
economics can be found in several references, for example [57].
When dealing with reliability eld data, frequently some
practical problems such as the unavailability of large sets of data
occur. This paper will briey touch on this and other problems, as
they are relevant to the discussion of model selection techniques.
This document is structured as follows. Section 2 refers to
common practical problems found in the analysis of reliability
data. Section 3 describes the concept of repairable systems and
identies some of the models available for their representation.
Section 4 presents a series of graphical and analytical tests used to
determine the existence of trends in the data. Section 5 proposes a
procedure based on these tests to correctly select a time-to-failure
model, discriminating between a renewal approach and the use of
an alternative, non-stationary model, such as the non-homogeneous Poisson process (NHPP). Section 6 presents numerical
examples using data coming from a eet of backhoes. Finally,
Section 7 contains a summary of the paper.
D.M. Louit et al. / Reliability Engineering and System Safety 94 (2009) 16181628
same design;
same hardware;
same function;
same installation, maintenance or operations people
(and conditions);
same procedures;
same systemcomponent interface;
same location and
same environment.
3. Repairable systems
A non-repairable system is one which, when it fails, is
discarded (as repair is physically not feasible or non-economical).
The reliability gure of interest is, then, the survival probability.
The times between failures of a non-repairable system are
independent and identically distributed, iid [23]. This is the most
common assumption made when analyzing time-to-failure data,
but as many authors mention, it might be unrealistic in some
situations. Many examples have been given of systems that rather
than being discarded (and replaced) on failure, are repaired. In
this case, the usual non-repairable methodologies (statistical
The
The
The
The
The
1621
lt Zbtb1 ,
(1)
lt eabt ,
(2)
Age
4. Trend testing techniques
Age
Age
Fig. 2. Possible trends in time between failures.
Time
Cumulative Failures
(3)
Cumulative Failures
Ni t N i1 t
with i 1Dtptpi Dt;
Dt
Cumulative Failures
li t
where Ni(t) is the total number of failures observed from time zero
to the ith interval and Dt the length of each interval. If there is a
trend in the data, then it will be reected in the average rate of
occurrences calculated. Then, if the system is improving, the
successive values of li(t) calculated will decrease and vice versa.
Time
Cumulative Failures
1622
Time
Time
Fig. 3. Cumulative failures vs. time plotsexamples (A: Increasing trend, B: no trend, C: two clearly different periods, D: non-monotonic trend).
Lt
X
T ij
1
,
YT ij
pt
(4)
where Tij is the time to the ith failure of the jth process under
observation, Y(Tij) the number of systems operating immediately
before time Tij and L(t) 0 for tomin{Tij}. The formula in Eq. (4)
is valid for multiple systems under observation (multiple
processes, j 1,2,y,m).
If there is no trend, then the plot would tend to be linear, and
any deviation from a straight line indicates some kind of trend. It
should be noted that when only one system is observed, then the
NelsonAllen plot is equivalent to the cumulative failures vs. time
plot. It is also interesting to notice that the NelsonAalen plot
counts the number of systems operating before a certain time;
thus it may include suspensions to assess trend.
4.1.4. Total time on test (TTT) plot
As mentioned above, sometimes we are in the presence of
several pieces of equipment. Now, the combined failure process
for the entire group of components observed may or may not
present a trend. This test is directed to the identication of trend
for the combined behavior. So, if there are m independent
processes with the same intensity function (i.e. several identical
systems under observation) and the observation intervals for each
one are all contained in the interval [0,S], then the total number of
P
failures will be N m
i1 ni , where ni is the number of failures
observed for each process in its particular observation interval.
For the superposed process (combination of the m individual
processes), let Sk denote the time to the kth failure time. And let
p(u) denote the number of processes under observation at time u.
If all processes are observed from time 0 to time S, then p(u) is
Rt
equal to m. Then, Tt 0 pu du is the total time on test from
time 0 to time t (this is known as the total time on test, or TTT,
transformsee [36]).
The TTT plot test for NHPPs is given by a plot of the total time
on test statistic, calculated as
R Sk
pu du
TSk
,
R0S
TS
pu
du
0
1623
upper right section, whereas for a bath-tub shape (Fig. 5C), further
spacing will occur in the middle section of the curve.
Some other graphical tools, such as control charts for reliability
monitoring (described by Xie et al. [37]), can also constitute a
useful method to identify if improvement or deterioration has
occurred in a particular parameter of interest, such as the rate of
occurrences of failures (ROCOF) or failure intensity. Nevertheless,
they rely on an RP assumption and are not directed to test for
trend when evaluating the use of a repairable systems approach.
4.2. Analytical methods
If preferred over the graphical approach, analytical testing
methods are available to test data for trends. Additionally, the null
and alternative hypotheses of these tests are of great help in the
determination of the most suitable model for the data.
Ascher and Feingold [22] provide a very complete survey of
analytical trend tests, and present them organized according to
their null hypothesis (i.e. RP, HPP, NHPP, monotonic trend, nonmonotonic trend, etc.). Hereby, only the most popular tests will be
described, according primarily to Elvebakk [38]. Other methods
are described and referenced in [46].
4.2.1. The Mann test
The null hypothesis for this non-parametric test is an RP. Then,
if this hypothesis is accepted, we can continue the reliability
analysis, tting a distribution to time-to-failure data. The
alternative hypothesis is a monotonic trend.
The test statistic is calculated counting the number of reverse
arrangements, M, among the times between failures. Let T1,T2,y,Tn
be the interarrival times of n failures. Then a reverse arrangement
occurs whenever TioTj for ioj. For example, if the following times
to failure were observed for a system:
21; 17; 48; 37; 64; 13;
(5)
n1 X
n
X
IT i oT j
(6)
i1 ji1
D.M. Louit et al. / Reliability Engineering and System Safety 94 (2009) 16181628
Note that Eq. (7) can be simplied when the starting point of
observation is time t 0, since (b+a) and (ba) both equal the end
point of the observation interval. The statistic above is applicable
for the case when only one process is being observed. Generalization of the laplace Test to more than one process is fairly
simple, and for m processes, the statistic is given by the following
expression (combined Laplace test statistic):
^
ni
m 1
Sm
i1 Sj1 T ij Si1 2 ni bi ai
q
^
2
1 m
12Si1 ni bi ai
(8)
sX
X
(9)
L
,
cc
v
MH 2
j1
ba
,
ln
Tj a
(11)
MH 2
ni
m X
X
b ai
.
ln i
T ij ai
(12)
12 nb a
q
,
^
2
1
12 nb a
j1 T j
where Tj is the age at failure for the jth failure, [a,b] is the interval
^
of observation and n is given by:
(
nobserved number of failures if the process is time truncated
^
n
n 1 if the process is failure truncated:
n
X
i1 j1
Pn^
L
(10)
with L given by Eq. (7). If the failure times follow a HPP, then LR is
asymptotically equivalent to L, as cc
v is equal to 1 when the times
between failures are exponentially distributed. That is, LR is
asymptotically standard normally distributed. As in the Laplace
test, the expected value of the statistic is zero when no trend is
present; thus deviations from this value indicate trend. The sign is
an indication of the type of trend.
CMMS Databases
Collect operating time for
each failure registered
1. Define object of
study
2. Identify similar
systems
No
data?
1625
Evaluate
Bayesian
techniques
Valid to
combine?
Order them chronologically
(failures only)
Graphical tests
(any)
Test for Renewal Assumption
Mann Test
Weibull
Exponential
other
No
trend
RP
Valid?
Fit
distributions
to data
TEST
HPP
Valid?
Laplace Test
LR test
Military HB
Test
Weibull
Log-linear
Determine
intensity
function
Evaluate
goodness of
fit
NHPP
(or other non-stationary model)
Evaluate
goodness of
fit
RP
TIME TO
FAILURE
MODEL
6. Case study
Failure data coming from a eet of backhoes, collected
between 1998 and 2003, are used in the following numerical
example to illustrate the use of the trend tests and selection
procedure described in the paper. These equipments are operated
by a construction rm in the United States. The data consist of the
age at failure for each of 11 pieces of equipment, with a total of 43
failures. Table A1 in the Appendix presents the complete data set.
The following example will consider two cases: (i) single-system
analysis, for which all calculations are based on backhoe #7 (with
7 failures during the observation period) and (ii) multiple-systems
analysis, using the pooled data for all 11 backhoes. Time to failure
is expressed in operating hours.
D.M. Louit et al. / Reliability Engineering and System Safety 94 (2009) 16181628
Cumulative failures
8
7
6
5
4
3
2
1
0
0
1000
2000
3000
4000
5000
6000
7000
Age (hours)
Fig. 7. Cumulative failures vs. time plotbackhoe #7.
3000
TTT statistic
3500
2500
2000
1500
1000
500
0
0
500
1000
1500
2000
2500
3000
3500
1
Scaled failure number
25
1627
Table A1
Failure times for a eet of 11 backhoes.
Fleet of backhoesfailure data
Cumulative intensity
20
Equipment #
Failure #
TBF (h)
1
2
3
4
5
346
1925
3108
3610
3892
346
1579
1183
502
282
1
2
3
4
875
2162
3248
4422
875
1287
1086
1174
1
2
3
4
2920
4413
4691
4801
2920
1493
278
110
1
2
3
4
1234
1911
2352
3063
1234
677
441
711
1
2
3
896
1885
2028
896
989
143
1
2
3
1480
3648
5859
1480
2168
2211
1
2
3
4
5
6
7
3090
3940
4844
5010
5405
5647
6143
3090
850
904
166
395
242
496
1
2
3
4
1710
1787
2297
2915
1710
77
510
618
1
2
3
1885
2500
2815
1885
615
315
10
1
2
3
1691
2230
2500
1691
539
270
11
1
2
3
1210
2549
2621
1210
1339
72
15
10
0
0
2000
4000
6000
Tij
Fig. 10. NelsonAalen plotall backhoes combined. Increasing intensity is
suggested.
7. Conclusions
This paper reviews several tests available to assess the
existence of trends, and proposes a practical procedure to
discriminate between (i) the common renewal approach to model
time to failure and (ii) the use of a non-stationary model such as
the NHPP, which is a model believed to be subject to an easy
practical implementation, within the alternatives available
for a repairable systems approach. The procedure suggested
is simple, yet it is believed that it will lead to better representation of the failure processes commonly found in industrial
operations. Through numerical examples, the use of the several
tests reviewed is illustrated. Some practical problems that
one may encounter when analyzing reliability data are also
briey discussed and references are given in each case for further
review.
Acknowledgements
We would like to thank Dr. Dragan Banjevic, of the Center for
Maintenance Optimization and Reliability Engineering at the
University of Toronto, for his valuable comments on an earlier
version of this paper.
References
[1] Dekker R, Scarf PA. On the impact of optimisation models in maintenance
decision making: the state of the art. Reliability Engineering and System
Safety 1998;60:1119.
[2] Ansell JI, Phillips MJ. Strategies for reliability data analysis. In: Comer P, editor.
Proceedings of the 11th advances in reliability technology symposium.
London: Elsevier; 1990.
[3] Scarf PA. On the application of mathematical models in maintenance.
European Journal of Operational Research 1997;99:493506.
[4] Ansell JI, Phillips MJ. Practical problems in the statistical analysis of reliability
data (with discussion). Applied Statistics 1989;38:20531.
[5] Jardine AKS, Tsang AHC. Maintenance, replacement and reliability: theory and
applications. Boca Raton: CRC Press; 2006.
[6] Campbell JD. Uptime: strategies for excellence in maintenance management.
Portland: Productivity Press; 1995.
[7] Campell JD, Jardine AKS, editors. Maintenance excellence: optimizing
equipment life-cycle decisions. New York: Marcel Dekker; 2001.
[8] Bendell T. An overview of collection, analysis, and application of reliability
data in the process industries. IEEE Transactions on Reliability 1998;37:
1327.
D.M. Louit et al. / Reliability Engineering and System Safety 94 (2009) 16181628
[9] Percy DF, Kobbacy KAH, Fawzi BB. Setting preventive maintenance schedules
when data are sparse. International Journal of Production Economics
1997;51:22334.
[10] Ansell JI, Phillips MJ. Discussion of practical problems in the statistical
analysis of reliability data (with discussion). Applied Statistics 1989;38:
23147.
[11] Barlow RE, Proschan F. Inference for the exponential life distribution. In:
Serra A, Barlow RE, editors. Theory of reliability, Proceedings of the
International School of Physics Enrico Fermi. Amsterdam: North-Holland;
1986. p. 14364.
[12] Lindley DV, Singpurwalla ND. Reliability and fault tree analysis using expert
opinions. Journal of the American Statistical Association 1986;81:8790.
[13] Singpurwalla ND. Foundational issues in reliability and risk analysis. SIAM
Review 1988;30:26481.
[14] Stamatelatos M, et al. Probabilistic risk assessment procedures guide for
NASA managers and practitioners. Washington, DC: Ofce of Safety and
Mission Assurance NASA Headquarters; 2002.
[15] Meeker WQ, Escobar LA. Statistical methods for reliability data. New York:
Wiley; 1998.
[16] OConnor PDT. Practical reliability engineering. 3rd ed. New York: Wiley; 1991.
[17] Mann NR, Shafer RE, Singpurwalla ND. Methods for statistical analysis of
reliability and life data. New York: Wiley; 1974.
[18] Barlow RE, Proschan F. Mathematical theory of reliability. New York: Wiley;
1965.
[19] Tsang AH, Jardine AKS. Estimators of 2-parameter Weibull distributions from
incomplete data with residual lifetimes. IEEE Transactions on Reliability
1993;42:2918.
[20] Bohoris GA. Parametric statistical techniques for the comparative analysis of
censored reliability data: a review. Reliability Engineering and System Safety
1995;48:14955.
[21] Bohoris GA, Walley DM. Comparative statistical techniques in maintenance
management. IMA Journal of Mathematics Applied in Business and Industry
1992;3:2418.
[22] Ascher HE, Feingold H. Repairable systems reliability. Modeling, inference,
misconceptions and their causes. New York: Marcel Dekker; 1984.
[23] Saldanha PLC, de Simone EA, Frutoso e Melo PF. An application of nonhomogeneus Poisson point processes to the reliability analysis of service
water pumps. Nuclear Engineering and Design 2001;210:12533.
[24] Weckman GR, Shell RL, Marvel JH. Modeling the reliability of repairable
systems in the aviation industry. Computers and Industrial Engineering
2001;40:5163.
[25] Rigdon SE, Basu AP. Statistical methods for the reliability of repairable
systems. New York: Wiley; 2000.
[26] Thompson WA. On the foundations of reliability. Technometrics 1981;23:
113.
[27] Calabria R, Pulcini G. Inference and test in modeling the failure/repair process
of repairable mechanical equipments. Reliability Engineering and System
Safety 2000;67:4153.