Sie sind auf Seite 1von 5

Repairable Systems

Reliability

required adjustments necessitate repair to reduce the


rate of such adjustments.

Probabilistic Modeling
Introduction
We begin by defining a repairable system, after some
preliminary concepts and definitions.
1. Part: an item not subject to disassembly and
hence discarded when it fails.
2. Socket: a circuit or equipment position which, at
any given time, holds a part of a given type.
3. System: a collection of two or more sockets and
their associated parts, interconnected to perform
one or more functions.
4. Nonrepairable System: a system that is discarded the first time that it ceases to perform
satisfactorily.
5. Repairable System: a system which, after failing
to perform at least one of its required functions,
can be returned to performing all of its required
functions satisfactorily by any method other than
replacement of the entire system.
Three points must be made. First, since small
appliances are systems, many systems, perhaps even
a majority, are nonrepairable. Nevertheless, the overwhelming majority of systems of interest in reliability applications are designed to be repaired, rather
than discarded, after their first failure. Henceforth,
therefore, the term system will be used to denote a
repairable system.
Secondly, given that a system contains n parts,
the definition of a repairable system allows up to
n 1 part replacements during a single repair. In
practice, however, when the repair would require
many new parts, it usually is more cost effective to
replace the entire system. Most repairs involve the
replacement of only a minute fraction of a systems
constituent parts and this has major implications for
probabilistic modeling and therefore, for statistical
analysis as well. Some repairs, e.g., cleaning contacts
or adjusting internal potentiometers, do not involve
replacement of any parts.
Thirdly, if a system can be returned to satisfactory
operation by occasional adjustments of external, i.e.
front panel controls, it hasnt failed and therefore,
doesnt require repair. However, too frequently

Most of the literature concerning methods for predicting the time to failure of a system assumes that
system failure is an absorbing state, i.e., that the system of interest is nonrepairable. In many cases, the
nonrepairable system can include one or more groups
of repairable redundant subsystems. Once a systemlevel failure occurs, however, the system is assumed
to be discarded. Such reliability with repair models
will not be considered in this article.
This section concentrates on black-box modeling and analysis. That is, models are postulated for
the pattern of system-level failures, regardless of the
systems design, and the postulated models can be
tested against even small datasets.

Model for Parts


The time to failure of a part is a random variable,
X, described either with the cumulative distribution
function
(1)
Fx (x) Pr{X x}
or the force of mortality (FOM)
hx (x)

Fx (x)
1 Fx (x)

(2)

Intuitively, hx (x) measures how likely it is that


one failure will occur soon after time x, given that it
has not occurred by time x.

Model for Systems


The failures of a system are described by a stochastic
point process T1 , T2 , . . . , where Ti denotes the arrival
time to the ith failure, measured from the instant
at which a system was first put into operation.
Equivalently, the process can be represented by the
interarrivals times, X1 , X2 , X3 , . . . , where Xi Ti
Ti1 and T0 0. Downtimes, usually very small
compared to most interarrivals times, are ignored
throughout, except for a brief mention in the section
titled Misconceptions and their Causes.
Let N (t) be the observed number of system
failures in the interval (0, t]. The expected number

Repairable Systems Reliability

of failures is V (t) E[N (t)]. The rate of occurrence


of failures (ROCOF) is defined as
v(t)

d
E[N (t)]
dt

(3)

The ROCOF is an absolute rate, whereas the FOM


is a relative rate. Hence, even though ROCOF and
FOM can be, and sometimes must be, represented
by the same mathematical function, with numerically
equal parameters, the section titled Misconceptions
and Their Causes shows that they have up-toinfinitely different interpretations.
We will consider only the homogeneous Poisson
process (HPP) and some generalizations of it: the
renewal process (RP), the superimposed renewal process (SRP), and the nonhomogeneous Poisson process
(NHPP). The HPP can be defined as a nonterminating
sequence of exponential interarrivals times which are
independent and identically distributed (i.i.d.). The
ROCOF of an HPP is a constant for all t 0. The RP
is a direct generalization, since the i.i.d. interarrivals
times can be distributed according to any nonnegative
distribution.
Consider two or more independent RPs, each of
which represents the pattern of failures in a socket.
Then the union of all failures, including the instants
at which they occurred, is an SRP. In general, the
SRP is not an RP; in fact, if the superposition of two
independent RPs is an RP, then all three are HPPs [1].
For a formal definition of an NHPP, (see Intensity Functions for Nonhomogeneous Poisson Processes). Here, an NHPP can be considered to be any
process where the ROCOF is a nonnegative deterministic function of time, independent of the history
of the process.
Most repairs involve replacement of only a minute
fraction of a systems parts. It is very implausible,
therefore, to assume the RP as a model, i.e. to
assume that a systems effective age is reduced to
zero by every repair. In fact, under the universal
engineers and laymens usage of the term repair,
a well-performed repair returns a failed system to
satisfactory performance, without even intending to
reduce its age to zero. Nevertheless, the RP is
often assumed to be the model for a system; see,
e.g., [2, 3].
A plausible system model can be developed by
assuming renewal in each of the systems sockets.
After all, the failed part in the socket is replaced
with a new one. Given an RP for each socket, and a

series system, the resulting model is the SRP. There


may be cases where one can work with an SRP
directly, but usually an approximation is needed. The
model for a finite number of superimposed renewal
processes is unknown, but there are limit theorems
(see [4]) indicating that when the number n of sockets
in a series system increases without bound, the SRP
converges to an NHPP. (If, in addition t , then
the SRP converges to an HPP [5].)
Many systems have some redundant paths, so it
might appear that the NHPP approximation would
not apply to them. However, the series parts often
dominate the systems reliability, so that redundant
paths often can be ignored in developing the SRP
NHPP model.
A more pragmatic way of deriving the NHPP as
the first-order system model is to imagine someone
trying to sell you a used car. The first things you
want to know are the total miles/kilometers on the
odometer and the year the car was manufactured.
Since each of these measures is independent of its
failure/repair history, the universal first-order model
for a car is the NHPP. You also want to know about
major repairs and overhauls, so this model is not
exact. However, the RP, which often is claimed
to be the system model, is absurd in this scenario; if
a salesman tried to convince you that a 10-year-old
car was only 2 days old because it had been repaired
2 days earlier, you would seek a car elsewhere!
The HPP (usually and erroneously called exponentiality by both practitioners and theorists) is often
portrayed as the system model, based on Drenicks
[6] asymptotic theorem. However, systems often do
not operate long enough to have such asymptotic
results hold, even to a reasonable approximation. For
example, a 10-year-old car modeled by an HPP would
have no effective age, not even the 2 days since the
last repair. The salesman, therefore, could claim that
it was brand new!
Fortunately, the NHPP is a very tractable model as
well. For example, the superposition of independent
NHPPs (SNHPP) is an NHPP, so unlike the RP
the model for the SNHPP is known. In addition, the
NHPP is much more tractable than the RP or the SRP.
The NHPP is not an exact model for a system,
but it often is a good working approximation. Even
far more complicated models ignore most or all
of the 18 shortcomings of probabilistic modeling of
systems listed in [7].

Repairable Systems Reliability

Statistical Analysis
The interarrivals times between failures of a system appear in natural order on a time line. The first
step in an analysis, therefore, is to test for trend. If
a trend toward larger interarrivals times (reliability
growth or improvement) or toward smaller interarrivals times (deterioration) exists, the interarrivals
times are not identically distributed. Hence, little or
nothing in many reliability and statistics books is
applicable, since many books concentrate solely on
i.i.d. data. If an assumption is dropped, it usually
is that of independence, rather than the even more
important assumption of identically distributed interarrivals times.

Parametric Models
Given a trend, the NHPP is the first choice as
a model. Consider the power-law process where
v(t) = t 1 .
Then if failures occurred at arrival times T1 = t1 ,
T2 = t2 , . . . , Tm = tm , over an observation interval
(0, t ], the maximum-likelihood estimators (MLEs)
of and are
m
(4)
= m


ln(t /ti )
i=1

m
t

(5)

If observation is till time Tm , then Tm is substituted for t in both equations. Many other techniques
for the power-law process and other NHPPs are provided in [8] and the references therein. Lawless and
Thiagarajah [9, 10] show how to include explanatory factors or covariates, e.g. different environmental
stresses experienced by different system copies, in an
analysis.
If there is no evidence of trend, the data can
be considered to be identically distributed, but the
interarrivals times of a copy may still be dependent.
In practice, however, one seldom has enough failures
to test the independence assumption [11].
If there is no evidence that the interarrivals times
are not i.i.d., one fits an RP to them. If an exponential
distribution provides adequate fit, the resulting model
is the HPP; otherwise, a more general RP must be
selected. The techniques (a) for fitting an exponential distribution to times to failure of parts and (b) for

fitting an HPP to system interarrivals times are interchangeable. The interpretations of the results, however, often are drastically different. This is because
a failed part is discarded at failure, whatever the
magnitude of its constant FOM, whereas a failed system modeled by an HPP is repaired to the ROCOF
it had when it was new, which may be very large.
For further information on the major differences in
interpretations, see [12, 13.]

Nonparametric Models
Cumulative Plots. If we plot cumulative failures
versus cumulative operating time, t, for a single system copy, we can perform trend testing visually. An
increasing, constant, or decreasing slope is an indication of deterioration, noncommittal, or improving
reliability, respectively, of that system copy. Deteriorating (improving) systems have become known as
sad (happy) systems to help distinguish these concepts from wear out (burn-in) of parts.
Mean Cumulative Function. In most applications,
data are available from two or more copies and
in many cases the number of copies is far greater
than two. The mean cumulative function (MCF) is
constructed incrementally at each instant a failure
occurs by considering the number of copies at risk
at that instant. Copies may not be in the risk set
because of left or right censoring. Left censoring
occurs when a copy is operated for some time without
knowledge of its failure history and then comes under
observation. Right censoring occurs when a copy is
no longer observed after a time t = t* for any reason,
ranging from the copy being totaled to simply not
accumulating more than t* hours at a time when the
number at risk is being calculated. The key reference
to this approach is [14].

Misconceptions and Their Causes


When parts are tested to failure, the i.i.d. assumptions
are plausible and greatly simplifying. When the times
between successive failures in a socket are analyzed,
the i.i.d. assumptions, leading to the RP, are plausible
but no longer tractable. For a system, the RP is
neither plausible nor tractable nor even desirable;
one hopes that successive interarrivals times tend to
become larger, e.g., through reliability growth testing

Repairable Systems Reliability

or from improved operating/maintenance procedures.


It is amazing, therefore, that some reliability texts
still totally ignore repairable systems or assume
that the RP is the only model for such a system.
Many texts do not provide techniques for testing for
nonstationarity and do not even hint that such trend
tests are essential.
Even when the RP is not assumed to be the
universal system model, sometimes it seems to be
assumed, when it is not. For example, [15] does
not restrict itself to the RP, but also uses the word
renewal as a synonym for repair. Hence a renewal
may return a system to the same-as-new condition of
an RP but it may also leave the system in some
other condition after the repair. As discussed in [13],
renewal sometimes is infinitely better described as
bad as new, rather than the virtually universally
used term, good as new.
It is widely believed, even among theorists, that
it is too difficult to apply nonstationary models.
However, the NHPP is much more tractable than
the RP. For example, the renewal function, i.e. the
expected number of failures over (0, t], is unavailable
in closed form, except for a few special cases; the
t
corresponding quantity under an NHPP is 0 v(z) dz,
and one selects a function for v, for which the integral
is known. Or, compare the simple MLE, equation (4),
for the shape parameter of the power-law process
with the messy transcendental MLE equation for the
shape parameter of the Weibull distribution, which
must be solved by trial and error.
It is also widely believed that an RP whose distribution has increasing (decreasing) FOM can model
deterioration (improvement) of a system. The main
reason for this misconception is that the FOM, hx (x),
is usually called failure rate, especially by theorists.
A natural interpretation of increasing failure rate is
an increasing number of failures per unit time, see
e.g. [16], but increasing FOM does not imply that.
Consider, for example, the model for a system where
Xi U(0, i], i = 1, 2, 3, . . . . Then each Xi has an
FOM which strictly increases from i 1 to infinity, but
the number of system failures per unit time decreases
asymptotically to zero as i . Even for parts that
are tested to failure, increasing FOM does not imply
an increasing number of part failures per unit time.
(Consider a group of a million people all born in
say, 1907. By now the number of deaths per year
is stochastically decreasing rapidly since very few
from the original million are still at risk of dying.)

The FOM should not be called failure rate, to avoid


assigning flagrantly counterintuitive meanings to the
misnomer failure rate.
There are other reasons for confusion between
FOM and ROCOF. For example, under an NHPP
the ROCOF of the process is equal to but not
equivalent to the FOM of the distribution of time
to first failure of the process. Therefore, under an
HPP both FOM and ROCOF are equal to the same
constant. However, when the area under any FOM
increases to infinity, one failure will occur with
probability one. In infinite contrast, when the area
under any ROCOF increases to infinity, the expected
number of failures increases to infinity. Hence, it is
essential to distinguish between FOM and ROCOF,
rather than to emphasize their superficially striking
but relatively unimportant similarities.
Confusion about the distinction between FOM and
ROCOF has led to the erroneous belief that there is
only one bathtub curve. In the bathtub curve for parts,
FOM is plotted against part age x. In the curve for a
system, ROCOF is plotted against the total operating
time t of the system, regardless of whether one or
more failures have occurred. In practice, both curves
usually are depicted as (t) = failure rate plotted
against t. This makes it impossible to tell which
bathtub curve is portrayed just from the plot. Some
authors refer to only one of these interpretations,
and some refer to both as if they were equivalent,
see e.g. [17]. In many cases, however, widespread
poor and incorrect terminology and notation make it
impossible to ascertain which bathtub curve is being
discussed, even from the text accompanying the plot.
One further point about the system bathtub curve:
the increasing ROCOF for large t implies that
v(t) as t . On the other hand, asymptotic theorems such as Drenicks imply that v(t)
approaches a finite constant as t . Theorists use
the asymptotic theorems to justify the HPP, whereas
practitioners erroneously assume that the bottom of
the system bathtub curve implies an HPP. Neither
group is aware of the others rationale.
For a system the interarrivals times are the times
between failures. For parts, the spacings between
order statistics also can be interpreted as times
between failures. Moreover, when parts are put
on test (or into operation) simultaneously and run
continuously, their times between failures appear
exactly in real time; because of nonzero repair times,
this is never exactly true for the interarrivals times of

Repairable Systems Reliability


systems. It is essential to clearly distinguish between
the two kinds of times between failures.
Most of the listed misconceptions and inherent
ambiguities, but especially the use of the misnomer,
failure rate for both FOM and ROCOF, caused
great confusion in [18], as acknowledged in [19]. For
example, two obviously different plots obtained from
the same failure numbers, one actually estimating
FOM and the other actually estimating ROCOF,
were claimed to be plots of the same failure
rate. Ascher [20] summarized the problem and its
causes and showed how Juran and Gryna [21] also
confused themselves by using the term failure rate
interchangeably for both FOM and ROCOF.
The many inherent ambiguities in analyzing
repairable systems failure data make it essential to
avoid avoidable ambiguities, such as (a) using t
for both the age of a part and for the total operating
time of a system and (b) using failure rate for both
FOM and ROCOF. Unfortunately, to the present writing, most practitioners have followed the lead of most
theorists in embracing both types of ambiguities.

References
[1]
[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

Cinlar, E. (1975). Introduction to Stochastic Processes,


Prentice Hall, Englewood Cliffs, p. 88.
Barlow, R. & Proschan, F. (1975). Statistical Theory of
Reliability and Life Testing: Probability Models, Holt,
Rinehart and Winston, New York, pp. 161162.
Zacks, S. (1992). Introduction to Reliability Analysis:
Probability Models and Statistical Methods, SpringerVerlag, New York, pp. 6783.
Barlow, R. & Proschan, F. (1975). Statistical Theory of
Reliability and Life Testing: Probability Models, Holt,
Rinehart and Winston, New York, pp. 245253.
Barlow, R. & Proschan, F. (1975). Statistical Theory of
Reliability and Life Testing: Probability Models, Holt,
Rinehart and Winston, New York, pp. 246251.
Drenick, R. (1960). The failure law of complex equipment, Journal of the Society for Industrial Applications
of Mathematics 8, 680690.
Ascher, H. & Feingold, H. (1984). Repairable Systems
Reliability: Modeling, Inference, Misconceptions and
their Causes, Marcel Dekker, New York, pp. 6369.
Rigdon, S. & Basu, A. (2000). Statistical Methods for the
Reliability of Repairable Systems, John Wiley & Sons,
New York.
Lawless, J. (2003). Statistical Models and Methods
for Lifetime Data, 2nd Edition, Wiley-Interscience,
Hoboken.

[10]

[11]

[12]

[13]

[14]

[15]

[16]
[17]

[18]
[19]
[20]
[21]

Lawless, J. & Thiagarajah, K. (1966). A point-process


model incorporating renewals and time trends, with
application to repairable systems, Technometrics 38,
131138.
Ascher, H. & Feingold, H. (1984). Repairable Systems
Reliability: Modeling, Inference, Misconceptions and
their Causes, Marcel Dekker, New York, pp. 8891.
Ascher, H. & Feingold, H. (1984). Repairable Systems
Reliability: Modeling, Inference, Misconceptions and
their Causes, Marcel Dekker, New York, pp. 144145,
151, 160.
Ascher, H. (2007). Different insights for improving
part and system reliability obtained from exactly same
DFOM failure numbers., Reliability Engineering and
Systems Safety 92, 552559.
Nelson, W. (2003). Recurrent Events Data Analysis
for Product Repairs, Disease Recurrences, and Other
Applications, SIAM, Philadelphia; ASA, Alexandria.
Ansell, J. & Phillips, M. (1989). Practical problems in
the statistical analysis of reliability data, Applied Statistics: Journal of the Royal Statistical Society, Series C 38,
205231.
Elsayed, E. (1996). Reliability Engineering, Addison
Wesley, Longman, Reading, p. 541.
Hoyland, A. & Rausand, M. (1994). System Reliability
Theory: Models and Statistical Methods, John Wiley &
Sons, New York, pp. 6, 2324, 487.
Frees, E. (1986). Nonparametric renewal function estimation, Annals of Statistics 14, 13661378.
Frees, E. (1998). Correction: nonparametric renewal
function estimation, Annals of Statistics 16, 1741.
Ascher, H. (1999). A set-of-numbers is NOT a data-set,
IEEE Reliability Transactions 48, 135140.
Juran, J. & Gryna, F. (1988). Quality Control Handbook,
4th Edition, McGraw-Hill, New York.

Related Articles
Age-Dependent Minimal Repair and Maintenance; Analysis of Recurrent Events from Repairable Systems; General Minimal Repair Models;
Imperfect Repair, Counting Processes; Intensity
Functions for Nonhomogeneous Poisson Processes;
Multivariate Imperfect Repair Models; Nonparametric Methods for Analysis of Repair Data;
Reliability Growth Modeling; Repairable Systems: Statistical Inference; Repairable Systems:
Bayesian Analysis; Software Failure Data Analysis; System Availability.
HAROLD E. ASCHER

Das könnte Ihnen auch gefallen