Fundamental Safety Engineering and Risk Management Concepts, 2012/2013
by M. J . Baker and H. Tan
INTRODUCTION TO RELIABILITY CONCEPTS 1. Introduction One of the most difficult problems faced by professional engineers is the random nature of many physical phenomena. Corrosion attack, chemical changes in process streams, bearing wear, fatigue cracking, etc, all make designing and managing physical assets a challenge. However it is the role of the engineer to plan for the future and ensure that any system can cope with the demands that are placed on it in service up to a point. For no technological system can be designed to withstand every possible demand placed upon it, that would lead to aircrafts that are too heavy to fly, ships with massively thick hulls, and would lead to financial strains on the society trying to use such systems. Indeed the best engineering systems are those that provide a balance between safety and cost. This balance should take into account the possibility of inspection and maintenance during the anticipated lifetime and the associated costs. As mentioned earlier, every engineering system contains random variation of some kind. These variations can take the form of changes in material properties, small changes in geometry, and fluctuations in the loads and other physical demands that the system is subject to. Traditionally, engineers have sought to deal with random variation by selecting conservative values of material properties and loads or pressures, etc, together with safety factors in order to come up with suitably robust designs. However, this can result in over-engineered and expensive systems. However, it is possible to take a different approach that uses all the information present to give a deeper insight into complex engineering problems. The random variation in the physical variables of the problem can be quantified using the theory of probability, and the performance of the system expressed in terms of a probability of failure. This probabilistic approach, whilst is more complex than traditional deterministic approaches, provides a more realistic view of system behaviour. This field of engineering is known as Reliability Engineering and it is the purpose of this and other lectures to study this in some depth. 2. Failure and Component Reliability In some situations, failure of an engineering component is easily defined and recognised. Take the fracture of a drive shaft on a centrifugal pump. The shaft can be in one of two states: a normal (operating) state, where it transfers force, or in a fractured failure state, where it does not. Described in this way, the shaft can be in one of two binary states: failed or not failed. In many other cases, however, failure is not so clear-cut and all that may be experienced is a reduction in performance. Returning to the pump as an example, the drive shaft bearings may be worn, preventing the pump from delivering its maximum head pressure. In this case, the pump is still operating, but not at the performance level that is acceptable. Has the pump failed? In general, failure is
Fundamental Safety Engineering and Risk Management Concepts, 2012/2013 by M. J . Baker and H. Tan
defined for a component or system as reaching, or being in, a state in which the component or system fails to fulfil its intended design function. Example 1: Standby pumps shall be available for un-interrupted operation in case running pump breaks down. Two centrifugal pumps are connected in parallel in duty/standby configuration as shown in Figure 1. The pump system is required to maintain 10 Barg +/ 0.5 Barg pressure in a flow loop. First, it is necessary to define a number of failure events based on the pressure performance requirements: - Failure 1: No pressure Both pumps fail to operate. - Failure 2: Overpressure The duty pump delivers more than 10.5 Barg in the flow loop. - Failure 3: Underpressure The duty pump delivers less than 9.5 Barg. It is also possible to define other failure events not linked to the pressure requirements. Examples are: - Failure 4: Loss of redundancy The standby pump will not start if required. There is no loss of pressure as the duty pump is still operating. - Failure 5: Loss of control The duty pump is delivering pressure, but does not respond to the shutdown request from the control. This analysis of a duty/standby pump arrangement demonstrates that very few failure modes are as simple and straightforward as they might appear on first sight. In order to be able to undertake a rigorous reliability assessment, the event Failure must first be defined without ambiguity.
Figure 1, Duty and standby centrifugal pumps. The non-failure, or reliability, of engineering components and systems is of particular interest. Reliability will be defined here as the probability of non-failure (i.e. survival) when the item is subjected to some fixed, or random, demand D. In general, the reliability R is defined as: 1 f R P = (1) The properties of probabilities therefore play a central role in the assessment of component and system reliability.
Fundamental Safety Engineering and Risk Management Concepts, 2012/2013 by M. J . Baker and H. Tan
Example 2: In some situations, failure of an engineering component would seem to be easily defined and recognised. Take, for example, the breakage of the filament of a normal tungsten lamp: with some degree of idealisation, the filament can be considered to be either in a normal (operating) state where it conducts electricity, or in a failure state where it does not. It would seem therefore that the filament can be in only one of two binary states: failed or not-failed. However, consider the following. Failure of the bulb to light (given that it is supplied with an electric current of the correct voltage) can arise for a number of reasons: mechanical breakage of the filament, disconnection of the wires leading from the base of the bulb to the filament, breakage of the glass followed by oxidation of the filament, etc. As a further complication, even after mechanical breakage of the filament, the bulb can continue to operate if the broken ends are in physical contact and this latter situation may lead to transient operation. It must be decided, therefore, whether failure in this situation is to be defined as failure of the bulb to light when required, or as mechanical breakage of the filament, or as light emission from the bulb falling below a given intensity, or as a combination (mathematical union) of some or all of these events. As in the previous example, to be able to undertake a rigorous reliability assessment we must be able to define failure without ambiguity. The probabilities associated with each of the states defined in the previous example may be quite different. 3. Bernoulli trial We consider the simplest case for reliability analysis. In the theory of probability and statistics, a Bernoulli trial is an experiment whose outcome is random and can be either of two possible outcomes, "success" and "failure". The mathematical formalization of the Bernoulli trial is known as the Bernoulli process. Independent repeated trials of an experiment with two outcomes only are called Bernoulli trials. Random variables describing Bernoulli trials are often encoded using the convention that 1 = "success", 0 ="failure". 4. Reliability Assessment of Components In the following, and in subsequent lectures, the reliability assessment of components will be addressed by first making a number of simplifying assumptions. These will be progressively relaxed to include more general and practical situations. For example, the failure probability, as defined by Equation (1), may depend not only on the exposure duration , but the rate at which failures occur may also change with time as a result of, say, mechanical deterioration. This will be discussed in more detail later. The simplest case to consider is where the probability of failure is (i) independent of absolute time, and (ii) the same for each repeated demand. Here the concept of repeated demand is very general and could be: the starting of an engine, the operation of a relay, etc. These two examples, however, are unlikely to fulfil the requirement of the probability of failure being independent of time i.e. the requirements of Bernoulli trials since both are
Fundamental Safety Engineering and Risk Management Concepts, 2012/2013 by M. J . Baker and H. Tan
likely to deteriorate with time. For example, engines starting on cold winter mornings are not the same as engines staring on warm summer mornings. Indeed, it is difficult to find practical examples of components which exhibit the characteristics of (i) and (ii) mentioned above; however, many systems come close to this behaviour close enough to make good engineering decisions which, after all, is the purpose of the analysis in the first place. In addition to the above considerations, there is another issue. There is a degree of arbitrariness in the definition of what constitutes a component and what constitutes a system. An electromechanical relay, for example, can be thought of as a manufactured component used in some larger system, but the relay itself is a sub-system of smaller components. In practice, it is necessary to decide on the scale of modelling to be adopted for any reliability assessment. 5. Basic Case Consider a single component with a single failure mode. The occurrence of the failure mode is denoted by the event F which corresponds to failure. Let us assume that the conditions of Bernoulli trials are met, namely that: - P f is independent of absolute time (i.e. it depends only on the period of exposure, or the number of exposures, to the situation, or demand, which can cause failure); - P f is the same for each repeated exposure to the demand; - Each trial is statistically independent of other trials; and - The failure state is irreversible. If the probability of failure in a single trial or exposure to the demand is p then: Reliability =Probability of success 1 P F R p ( = = =
(2) Now consider n trials. Let us define R n as the probability of success in n successive trials. The reliability R n is then given by:
( ) 1 2 1 2 n n n n R P F F F P F P F P F P F ( =
( ( ( =
( =
(3) Or in general, for the situation where the successive demands are statistically independent ( ) 1 n n R p = (4)