Sie sind auf Seite 1von 8

Reliability Engineering and System Safety 56 (1997) 161-168

~) 1997 Elsevier Science Limited


All rights reserved. Printed in Northern Ireland
ELSEVIER PII: S 0 9 5 1 - 8 3 2 0 ( 9 7) 00 0 1 0- 0 0951-8320/97/$17.00

The role of N H P P models in the practical


analysis of maintenance failure data
Jasper L. Coetzee
Department of Mechanical and Aeronautical Engineering, Faculty of Engineering, University of Pretoria, Pretoria 0002,
Republic of South Africa
(Received 6 December 1995; revised 30 November 1996; accepted 22 January 1997)

The analysis of failure data is an important facet in the development of


maintenance strategy for equipment. Only by properly understanding the
mechanism of failure, through the modelling of failure data, can a proper
maintenance plan be developed. This is normally done by means of probabilis-
tic analysis of the failure data. From this, conclusions can be reached regarding
the etfectiveness and efficiency of preventive replacement (and overhaul) as
well as that of predictive maintenance. The optimal frequency of maintenance
can also be established by using well developed optimisation models. These
optimise outputs, such as profit, cost and availability. The problem with this
approach is that it assumes that all repairable systems are repaired to the
'good-as-new' condition at each repair occasion. Maintenance practice has
learnt, however, that in many cases equipment slowly degrades even while
being properly maintained (including part replacement and periodic overhaul).
The result of this is that failure data sets often display degradation. This
renders conventional probabilistic analysis useless. During the last two
decades, a few researchers applied themselves to the solution of this problem.
This paper briefly examines the present state of the theoretical foundation of
repairable systems analysis techniques and then develops two formats of the
Non-Homogeneous Poisson Process model (NHPP model) for practical use by
the maintenance analyst. This includes an identification framework, goodness-
of-fit tests and optimisation modelling. The model is tested on two failure data
sets from literature and one from industry, t~ 1997 Elsevier Science Limited.

NOTATION
Repairable systems
Failure rate ( R O C O F )
General p(t)
N(T) Number of failures in (0,T)
t Time from start of present life (short term E{N(T)}
Expected number of failures in (0,T)
time--time over the course of one failure
cycle) R(TI, T2) Probability of system survival in (T1, Tz)
t~ 0 Parameter of N H P P model p,(T)
T Time from start of system life (long term
Parameter of N H P P model p l ( T )
t i m e - - t i m e over the course of several a,
Parameter of N H P P model p2(T)
failure cycles) A
Parameter of N H P P model p2(T)
n Number of observed failures for a /3
Cost of repair of a failure (minimal repair
component or system Cr
[4])
MTTF Average operational life of a component
Cost of system replacement
from installation to failure Cp
MTBF Average time between system failures

Renewal processes 1 INTRODUCTION


f(t) Failure density function
F(t) Failure distribution function
R(t) Survival function The analysis of failure data is generally preceded by
z(t) Hazard rate (FOM) the supposition that the failure data set is independent
161
162 J. L. Coetzee

and identically distributed (i.i.d.). Independent and Bassin [5, 6], Crow [12, 13], Bell and Mioduski
identically distributed failure data are, amongst [7], Durr [17] and Ascher et al. [3].
others, generated by renewal processes--where the
component(s) is(are) totally renewed by the main-
tenance action (reconditioning or replacement).
Although there are a high percentage of failure data 2 MODEL DEVELOPMENT
sets for which this assumption is true, it is far from
universally applicable [25]. Many repairable systems
show a tendency towards long term reliability The base assumption in most present reliability
degradation with repeated overhauls and replacement analyses is that renewal takes place after failure of a
of single system components. This has the effect that component or system. This is the 'good-as-new'
successive times between failures are dependent as approach. If renewal takes place, the successive
well as coming from different distributions (not failures' arrival times to failure will be independent
identically distributed). Because of this the traditional and identically distributed. The data can then be
failure analysis techniques, where the failure data are reordered in the conventional failure interval histo-
summarily reordered in order of magnitude and a gram. This can then be used to fit a standard statistical
standard statistical distribution fitted, are not suitable distribution, after which, optimisation of the main-
for these situations. tenance strategy for the component or system can take
While data analysis techniques for homogeneous place.
data are fairly well developed, this is not true for The renewal assumption is valid in many main-
non-homogeneous data. Only in the last fifteen years tenance situations (notably in the case of single cell
have some researchers shown that there is an components). On the other hand, there are a large
important class of failure data for which the customary proportion of practical maintenance situations which
analysis techniques yield incorrect results. New are not well represented by the renewal assumption.
analysis techniques have thus started emerging for These are known as 'repairable systems' and have the
failure data with a long term life trend (reliability feature that complete repair does not take place after
growth and reliability degradation). Although re- failure. These typically include equipment (systems)
liability growth is experienced in situations where and sub-units (sub-systems) where repair of the
equipment is being improved, we are more interested system (or sub-system) consists of the replacement or
in the reliability degradation case, as this reflects the repair of only a small part of the system (or
typical longer term real world situation in many sub-system). The system is thus not in the
repairable systems. 'good-as-new' condition after repair, but in the
There are presently two main streams of develop- 'bad-as-old' condition (the same condition the system
ment for the analysis of non-homogeneous data. was in prior to failure). This is called 'minimal repair'
These are models based on the Non-Homogeneous [4]. This leads to the typical system being subject to
Poisson Process (NHPP) [2, 8, 14] and models based on reliability degradation, with an accompanying increase
Proportional Hazards [10] and its derivatives [22, 21]. in the failure rate (ROCOF) over time (the so called
Both streams have benefits and will probably play a 'sad' trend of Ascher (1983 comment on article by
role in the improvement of failure analysis techniques Lawless [19])). One should also note that not all single
for repairable systems. The present article will cell component replacements or repairs constitute
develop a practical analysis technique for repairable renewal as the position in which the component is
systems, using the NHPP models. The NHPP models installed may deteriorate in time, thus causing a
are chosen based on the following criteria. deteriorating trend, even on complete repair of the
component itself. The 'repairable system' situation
• It is generally suitable for the purpose of cannot be modelled by the conventional fitting of a
modelling data with a trend. This is notably so statistical distribution function as successive failures
due to the fact that the accepted formats of the are firstly not identically distributed and secondly not
NHPP are monotone increasing/decreasing func- independent.
tions. More important, the NHPP models are Another class of failure models has also emerged
especially suited to model the 'bad-as-old' between renewal models on the one extreme (perfect
situation [1, 12, 24]. repair) and repairable systems on the other extreme
• The NHPP models are mathematically straight- (minimal repair). This class is known as 'imperfect
forward. Due to this their theoretical base is well repair'. In these models the repair result is deemed to
developed, including goodness of fit tests and be better than 'bad-as-old', but worse than 'good-as-
confidence interval procedures. new'. The NHPP-models described in this text are part
• The models have been tested fairly well. of the class of 'repairable systems', as this is, in the
Examples include Ascher and Feingold [1], author's opinion (as a practising maintenance man)
NHPP models 163

T o discern between the different data types two


Failure data (interarrival times) tests are necessary:
in original
chronological order • A trend test to determine whether a long term
trend is present in the data. If such a trend is
present, a N H P P model should be used to model
the failure process. Suitable trend tests include
I Repairable the Laplace test and the MIL-HDBK-189 (1981)
Yes systems models test.
- (i.e. NHPP-models) • A dependence test to determine whether
- and imperfect
repair models successive failures are dependent in data without
a long term trend. Any proper test for serial
dependence can be used for this purpose. A
simple test for serial dependency is to be found
in Krishnaiah and Sen [18], pp. 102-104.
interarrival times identical
distributed but not
necessary independent
2.1 Formats of the NHPP model

The following two formats of the N H P P model are


amongst those which have found general acceptance
Yes . Branchingpoisson in the literature. Cox and Lewis [11] introduced the
-[ processmodel N H P P model:

p l ( T ) = e %÷'*'r, - o~ < Oto,a 1 < ~ , T >- O. (1)

This format of the NHPP-model models repairable


systems well with a l > 0.
Data i.i.d. A second, well accepted format of the N H P P model
renewal process
is the 'Power law process' [12]:

l
Conventional
p2(T) = A/3T ~-~, A,/3 > O, T >- O. (2)

analysis techniques This model models repairable systems when /3 > 1.


/3 = 2 results in a linearly increasing failure rate.
fig. 1. Model identification framework. There are various other formats of the N H P P model
that have been proposed by various authors. The
above mentioned two are ones that have gained fairly
one of the most prevalent conditions found in the general acceptance.
practical repair situation.
The diagram as presented by Ascher and Feingold
2.2 Standard functions
[2] (modified) is shown in Fig. 1 and is used as a basis
for the practical analysis of maintenance data.
According to Fig. 1 there are three categories of From the definition of a N H P P follows that the
maintenance situations. These are: expected number of failures in the interval (Tt,T2) for
• Reliability degradation--the failure data will the model p 1 ( T ) is:
exhibit a long term 'sad' failure trend.
NHPP-models (amongst others) are well suited E,{N(T2) - N(Tt)} = e% (e. r2 _ e~,r,) ' T2 >- T1 >- O,
to modelling these situations. O/1
• Situations where there is no long term trend in
- o o < Oto, a t < ~ . (3)
the data, but there is still dependence between
successive failures. This situation is modelled Similarly the expected system reliability in the
well by the Branching Poisson Process (BPP). interval (T1, T2) is:
• R e n e w a l - - t h e data is independent and identi-
cally distributed. It can thus be modelled by RI(T,,T2)= e~,(e",r~-e",r,), T2 > T1 ~ O,
fitting a suitable statistical distribution function
to the data. --0¢ " ( a 0 , O~1 < OO. (4)
164 J. L. Coetzee

The average time between failures in the interval models and has the benefit that the start time of the
(T1, T2) is: test need not be zero as is the case with Crow's
application of the Cram6r von Mises test. The X 2 test
~ , ( ~ - r~) is applied in the customary way with the expected
MTBFI(TI,T2) -e%(e~f~_ e~,r,), T 2>- Tx ->0,
number of failures in any interval (T,, Tb) given by:
-o~ < ao, al < o~. (5) For pI(T):
e%
In the same way, the equivalent functions for the eab = - - (e ~'r, - e='rO (13)
O/1
model p2(T) are:
For p2(T):
E2{N(T2) - N(T~)} = h ( r 2~ - r~), h,/3 > 0 , r 2 -> T~ ->0 eah = )t(Tg - T~). (14)
(6)
2.5 Cost modelling
R2(T~,T2)=e-a('g-z~, h,fl>0, T2->T1->0 (7)
T2-T1 The two N H P P models as developed above can assist
MTBF2(T~,T2) A(T,~_T,~),A,,8>0,~
~ T2->TI->0. (8) t h e analyst in:
• Understanding the failure behaviour of the
2.3 Parameter estimation repairable system, whilst providing a mathemati-
cal equation that could be used in subsequent
model studies.
Using maximum likelihood estimates, the parameters
• Forecasting future failures through the use of the
for pI(T) can be found from:
model's mathematical formulation.
• Optimising the maintenance strategy for the
T~ + n a { ~ - nT,,{1 - e - " , r } -' = 0 (9) repairable system by adding relevant cost
i=1
information.
and
As was said previously, the N H P P models model
11 ,10, the minimal repair or bad-as-old situation. The cost
models that can be applied with success to this
The parameter values are found by solving eqn (9) situation includes type 2 policies [4] and type 3
for &l and then substituting this value in eqn (10) to policies [20].
solve for t~0. A simple interactive search or repeated
halving can be used to find til. 2.5.1 Type 2 policies
The maximum likelihood estimates for the para- Type 2 replacement policies were introduced by
meters of p2(T) are: Barlow and Hunter [4] and involves the planned
replacement of a system at a certain age with minimal
/~ n (11) repairs at breakdown up to that age. The model
In Tn optimises cost per unit time over an infinite time
1=1 T/ horizon. The optimum life T* at which system
replacement should take place is given by:
and
For pl(T):
n
= T--~"
(12) (
e ~,r" T * - ~
1)_cp 1
C/e% a~" (15)

2.4 G o o d n e s s o f fit tests For p2(T):


[ c, ], 1

(16)
It can be stated in general that not much work has T* = h(fl - 1)CIJ "
been done in the area of goodness of fit for N H P P
models. Ascher and Feingold [2] identifies this as one 2.5.2 Type 3 policies
of the areas where more work is required. The main Morimura and Makabe [20] introduced type 3 policies.
problem is that tests such as the Cram6r von Mises It prescribes system replacement after an optimum
test and the Kolmogorov-Smirnov test were not number of failures n* has been repaired minimally. It
originally developed for the parametric case. Darling is superior over type 2 policies in the sense that the
[15] made modifications to the CramCr von Mises test full life of the last minimal repair is utilised--the
to apply it to the parametric case and Crow [12] system is only replaced at the next breakdown. Type 2
applied it to the model p2(T). policies are equivalent to type 3 policies if the system
The standard Z 2 test can be applied to both N H P P is replaced at the first breakdown following T*.
NHPP models 165

The optimum number of minimal repairs before Davis (1952) bus engine data
Expected number of failures
system replacement is given by: 12-
For pl(T): galO
(m - 1)e% •-- 8
n* (17)
of 1

where m is obtained from:

m(lnm_l): - Ce _ 1 (18)
~ 80000 160000 240000 320000 400000
OlI ere % (x I
Cumulative use (miles)
For p2(T): -- E(N(T)) xN(T)

n* = -- Ce (19) Fig. 2. Fit o f p 1 ( T ) o n b u s e n g i n e d a t a .


G ( / 3 - 1)"

3 MODEL APPLICATION successive lives as given above are representative of


the successive lives of a typical engine).
Three failure data sets are used to illustrate the use of The result of fitting the model pt(T) to the data of
the NHPP models for data with a long term trend Davis' engines is shown in Fig. 2, while the fit of p2(T)
(typical repairable systems). The first two of these are is shown in Fig. 3.
well known failure data sets from the reliability The resultant failure rate and survival functions are
literature, while the third is a recent example from shown in Figs 4 and 5. Figure 4 shows that p~(T) has a
industry. much sharper gradient than p2(T). From practical
maintenance experience, it is very unlikely that the
3.1 Davis 116] bus engine data failure rate will have as sharp a gradient as p~(T). This
is supported by a visual inspection of Figs 2 and 3.
The bus engine failure data of Davis [16] is a very Although both models will provide adequate results
good example of how conventional failure analysis for describing the failure mechanisms within the
techniques can lead to wrong conclusions. In his bounds of the original failure data, the model p2(T)
paper, Davis found that the failure mechanism could will be preferred for forecasting purposes.
be modelled by an exponential failure density (after
aggregating the various engines' data (there were 191 3.2 Proschan [231 air conditioner data
engines) and reordering in order of magnitude). If one
keeps the engine lives in the original chronological Proschan's air plane air conditioner data consists of 13
order, Table 1 results. individual data sets, of which some show trends. By
It is obvious by inspecting Table 1 that there is a fitting the two NHPP models to the data with trends,
marked 'sad' trend present. This can be modelled the models presented in Fig. 6 resulted.
using NHPP models, while the various engine lives in The function p1(T) again has a much steeper
the different life categories (say the 2nd life data) can gradient than p2(T). In this case the survival rate over
be modelled by fitting a statistical distribution to the the next 100 hours is very similar for both models (see
data. One can thus have five different engine life Fig. 7). Which model to choose will be dictated by the
distributions represented in Table 1, each of which can expected shape of the failure rate function, p~(T) will
be modelled by fitting statistical distributions. The
only way in which the sequence of successive lives can
be modelled meaningfully, though, is by fitting models Davis (1952) bus engine data
Expected number of failures
such as the NHPP models presented above to the data 16-
(that is, if one assumes that the averages of the ~. 1 4 -
12
Table 1. Davis' engine data average lives 10-
s
E n g i n e life A v e r a g e life ( m i l e s ) 6
4
1st 94000 z 2
u I
2nd 70000 80000 160000 240000 320000 400000
3rd 54000 Cumulative use (miles)
4th 41000
-- E(N(T)) xN(T)
5th 33000
Fig. 3. F i t o f p 2 ( T ) o n b u s e n g i n e d a t a .
166 J. L. Coetzee

Davis (1952) bus engine data Proschan (1963)


Failure rate air conditioner data
0.00014 ~- (}.07
Failure rate ~ _ .
0.000(2 0.06
0.000 ( -- O.05
8E-05 ~--~o.o4
0 Q
6E-05 0.(}3
4E-05 0.02
2E-05 0.01
I I I I I I
( 00000 2 0 0 0 0 0 3 0 0 0 0 0 400000 400 800 1200 16(X) 2 0 0 0 2400
Cumulative life (miles) Cumulative life (hours)
×Model I ~ M o d e l 2 xModel 1 DModel2
Fig. 4. Shapes of p(T) for bus engine data.
Fig. 6. Shapes of p(T) for Proschan's air conditioner data.

tend towards more conservative estimates of p(T). obtained for the particular truck:
Practical maintenance experience will favour the use
Cp = $1,300,000
of oI(T) in this case.
Cr= $7,165.
For the fitted model:
3.3 Caterpillar haul truck
% = - 6.545
at = 1.07 × 10 -4.
The data presented in Table 2 was collected from the
failures of a Caterpillar 789 180 ton haul truck doing By substitution into eqn (15), the optimal system
service at a large open cast colliery. The data consists replacement frequency (type 2 policy) is given as:
of the first 10 and last 10 failures in a data set of 128 T* = 21,293 hours.
failures. See [9] for the full data set. The optimal cost per unit time is then given by:
The results of doing a Laplace trend test on the data
reveal a fairly strong reliability degradation trend. C(T*) = C/E(N(T*)) + Cp
This is supported by the failure trend displayed in Fig. T*
8. with
The resulting fit for pt(T) (the best model for the e%ea~ T*
data) is shown in Fig. 8. It is clear that p1(T) presents E(N(T*)) =
O/1
a good fit to the Caterpillar data. This fit will be used This leads to:
to illustrate the use of the cost models to optimise the
system replacement strategy. C(T*) = $101.86 per hour.
One of the most important decisions that a The optimal replacement policy using the type 3
maintenance manager must take comprises the timing policy gives:
of system (equipment) replacements. This should be
n* = 118 minimal repair occasions.
done so as to optimally balance the cost of
maintenance against capital expenditure. For the The truck will thus be repaired minimally 118 times
present example the following cost figures were and will then be replaced at the next ( l l 9 t h ) failure.
The resultant cost for this policy is:

Davis (1952) bus engine data Proschan (1963) air conditioner data
Reliability over next 20000 miles Reliability over next 100 hours
1.0
1.0
0.8
0.8
~ 0.6
~,0.6
0.4
~" 0.4
0.2
0.2-
I I ~ I I
0 80000 160000 240000 320000 400000 400 800 1200 1 6 0 0 2000 " ~,lb0"
Cumulative use (miles) Cumulative life (hours)
×Model 1 o M o d e l 2 xModel I DModel2
Fig. 5. Expected survival rate over next 20000 miles. Fig. 7. Expected survival rate over next 100 hours.
N H P P models 167

Table 2. Caterpillar failure data

Successive lives of Caterpillar 789 haul truck in hours

Life no. ti (hr) 77 (hr) Life no ti (hr) Ti (hr)

1 78 78 119 56 21762
2 80 158 120 105 21867
3 173 331 121 45 21912
4 50 381 122 2 21914
5 142 523 123 23 21937
(i 97 620 124 1 21938
"7 44 664 125 1 21939
8 1141 1805 126 12 21951
9 12 1817 127 3 21954
10 251 2068 128 28 21982

C t t e r p i l l a r 7 8 9 180 t o n n e t r u c k 4. Barlow, R. E. and Hunter, L., Optimum preventive


Expected cumulative number of failures maintenance policies. Operations Research, 1960, 8,
140 90-100.
~ 120 5. Bassin, W. M., Increasing hazard functions and overhaul
100
policy. ARMS, IEEE-69, C8-R, 1969, pp. 173-180.
6. Bassin, W. M., A Bayesian optimal overhaul interval
80
o model for the Weibull restoration process. Journal of
~- 60
the American Statistical Society, 1973, 68, 575-578.
~ 4o 7. Bell, R. and Mioduski, R., Extension of life of US Army
Z 20 trucks. ARMS, IEEE-76 CHO-1044-7 RQC, 1976, pp.
3500 7000 10500 14000 17500 21000
200-205.
C u m u l a t i v e life ( h o u r s )
8. Coetzee, J. L., The analysis of failure data with a long
term trend. Masters dissertation, University of Pretoria,
-- E(N(T)) xN(T)
1995.
9. Coetzee, J. L., Reliability degradation and the
Fig. 8. Fit of pI(T) to Caterpillar dump truck failure data. equipment replacement problem. In International
Conference of Maintenance Societies, Melbourne,
Australia, 1996.
C(T*) = $100.52 per hour which is marginally more 10. Cox, D. R., Regression models and life tables (with
economic than the type 2 policy. discussion). Journal of the Royal Statistical Society Series
B, 1972, 34, 187-220.
11. Cox, D. R. and Lewis, P. A., The Statistical Analysis Of
Series Of Events. Methuen, London, 1966.
4 CLOSURE 12. Crow, L. H., Reliability analysis for complex repairable
systems. In Reliability and Biometry, eds F. Proschan
and R. J. Serfling. SIAM, Philadelphia, 1974, pp.
The use of N H P P models to model failure data with a 379-410.
long term (reliability degradation) trend leads in a 13. Crow, L. H., Evaluating the reliability of repairable
systems. In Proceedings of the Annual Reliability and
new dimension in the modelling of maintenance Maintainability Symposium. IEEE, Los Angeles, 1990,
failure data. It is an important step in the support of pp. 275-279.
maintenance strategy optimisation techniques such as 14. Crowder, M. J., Kimber, A. C., Smith, R. L. and
Reliability Centred Maintenance. Sweeting, T. J., Statistical analysis of reliability data.
Chapman and Hall, London, 1991, pp. 104-116.
15. Darling, D. A., The CramEr-Smirnov Test in the
parametric case. Annals of Mathematical Statistics, 1955,
REFERENCES 26, 1-20.
16. Davis, D. J., An analysis of some failure data. Journal of
the American Statistical Society, 1952, 47, 113-150.
1. Ascher, H. E. and Feingold, H., Bad-as-old analysis of 17. Durr, A. C., Operational repairable equipments and the
system failure data. In Annals of Assurance Sciences. Duane model. In Proceedings of the International
Gordon and Breach, New York, 1969, pp. 49-62. Conference on Reliability and Maintainability. Centre de
2. Ascher, H. E. and Feingoid, H., Repairable systems Fiabilit6, CNET, Lannion, France, 1980, pp. 189-193.
reliability. In Lecture notes in Statistics, Vol. 7, Marcel 18. Krishnaiah, P. R. and Sen, P. K., eds., Handbook of
Dekker, New York, 1984. Statistics, Vol. 4---Nonparametric Methods. North-
3. Ascher, N. J., Donelson, J. and Higgens, G. F., Changes Holland, Amsterdam, 1984.
in helicopter reliability/maintainability characteristics 19. Lawless, J. F., Statistical methods in reliability.
over time. IDA Study S-451, Institute for Defence Technometrics, 1983, 25(4), 305-335.
Analyses, USA, 1975. 20. Makabe, H. and Morimura, H., On some preventive
168 J. L. Coetzee

maintenance policies. Journal of the Operational 23. Proschan, F., Theoretical explanation of observed
Research Society of Japan, 1963, 6, 17-47. decreasing failure rate. Technometrics, 1963, 6, 375-383.
21. Pijnenburg, M., Additive hazard models in repairable 24. Thompson, W. A.Jr., On the foundations of reliability.
systems reliability. Reliability Engineering and System Technometrics, 1981, 23(1), 1-13.
Safety, 1991, 31, 369-390. 25. Walls, L. A. and Bendell, A., The structure and
22. Prentice, R. L., Williams, B. J. and Peterson, A. V., On exploration of reliability field data: what to look for and
the regression analysis of multivariate failure time data. how to analyse it. Reliability Engineering, 1986, 15,
Biometrika, 1981, 68, 273-279. 115-143.

Das könnte Ihnen auch gefallen