Reliability CATS 2017

CAT 1
1. (i) Discuss what is reliability and the concept of reliability engineering? (5mks)
Reliability is the probability that an item will perform a required function without failure under
stated conditions for a stated period of time. It is associated with unexpected failures of
products or services and understanding why these failures occur is key to improving reliability.
Reliability Engineering encompasses principles and practices associated with reliability

requirements (such as prediction of failure time and conditions) and their translation into
specifications that are incorporated in product design and production.
The most important aspect of reliability is to identify cause of failure and eliminate in design if
possible otherwise identify ways of accommodation
ii) Identify and explain four objectives of reliability engineering? (8mks)
 To apply engineering knowledge to prevent or reduce the likelihood or frequency of

failures
 To identify and correct the causes of failure that do occur
 To determine ways of coping with failures that do occur`
 To apply methods of estimating the likely reliability of new designs, and for analyzing
reliability data.
iii) Time to failure distribution of a gas turbine system can be represented using
Weibull distribution with scale parameter h = 1000 hours and shape parameter b =
1.7. Find the hazard rate of the gas turbine at time t = 800 hours and t = 1200 hours.
(7mks)
At a time t=800hrs;
h (t
h (t) =1.7/1000*(800/1000)1.7-1
h (t) =0.000952
1
At a time t=1200hrs;
h (t
h (t) =1.7/1000*(1200/1000)1.7-1
h (t) =0.001428
iv) Explain the following types of redundancy
a) Cold Standby System (2Mks)

A cold standby is a redundancy method that involves having one system as a backup for another
identical primary system. The cold standby system is called upon only on failure of the primary
system. In a cold standby system, a redundant item is switched on only when the operating item
fails. That is, initially one item will be operating and when this item fails, one item from the
redundant items will be switched on to maintain the function. In a cold standby, the hazard
function of the item in standby mode is zero.
b) Warm Standby System (2Mks)

Warm standby is a redundancy method that involves having one system running in the
background of the identical primary system. The data is regularly mirrored to the secondary
server. Therefore at times, the primary and secondary systems do contain different data or
different data versions.
c) Hot Standby System (2MKs)

Hot standby is a redundant method in which one system runs simultaneously with an identical
primary system. Upon failure of the primary system, the hot standby system immediately takes
over, replacing the primary system. However, data is still mirrored in real time. Thus, both
systems have identical data.
Hot standby is also known as hot spare, especially at the component level, such as a hard drive
in a disk array.
2
v) Discuss Failure-Based maintenance policy, FBM (4Mks)
Failure-Based maintenance policy, FBM, represents an approach where corrective maintenance
tasks are carried out after a failure has occurred, in order to restore the functionality of the
item/system considered.
Consequently, this approach to maintenance is known as breakdown, post failure, firefighting,
reactive, or unscheduled maintenance. According to this policy, maintenance tasks often take
place in ad hoc manner in response to breakdown of an item following a report from the system
user.
A schematic presentation of the maintenance procedure for the failure based maintenance policy
is presented below:
2. i) Discuss the main reasons why failure occurs? (7mks)

1) The design might be inherently incapable. It might be too weak, consume too much
power, suffer resonance at the wrong frequency etc. every design problem presents the
potential for errors, omissions, and oversights. The more complex design or difficult the
problems to be overcome, the greater is this potential.
2) The item might be overstressed in some way. If the stress applied exceeds the strength
then failure will occur. An electronic component will fail if the applied electrical stress
(voltage, current) exceeds the ability to withstand it, and a mechanical strut will buckle
if the compression stress applied exceeds the buckling strength. Overstress failures like
these do happen but not often since designers provide margins of safety.
3) Failures might be caused by variation. The actual strength values of any population of
components will vary: there will be some that are relatively strong, others that are
relatively weak, but most will be of nearly average strength. Also the loads applied will
be variable but if there’s an overlap between the distributions of load and strength, and a
load value in the high tail of load distributions is applied to an item in the weak tail of
3
the strength distribution so that there’s an overlap or interference between the
distributions then failure will occur.
4) Failures can be caused by wear out. Wear out include any mechanism or process that
causes an item that is sufficiently strong at the start of its life to become weaker with
age. Examples of this processes include materialfatigue,wear between surfaces in
moving contact, corrosion, insulation deteroriation and other wear out mechanisms
5) Failures can be caused by other time-dependent mechanisms. Battery run-down, creep
caused by simultaneous high temperature and tensile stress, as in turbine discs and fine
solder joints, and progressive drift of electronic component parameter values are
examples of such mechanisms
6) Failures can be caused by sneaks. A sneak is a condition in which the system does not
work properly even though every part does.
7) Failure can be caused by errors such as incorrect specifications, designs or software
coding, by faulty assembly or test, by inadequate or incorrect maintenance, or by
incorrect use.
ii) Explain any six repercussions of poor reliability? (6mks)
Sr. Type of Application Consequence of Poor

Reliability
1. Control software for Exposure to the risk of

chemical processes that uncontrolled chemical reactions
need to run continuously taking place
2. Software for military Risk to a country's defenses

surveillance radar
3. Online systems with Considerable financial loss and

worldwide user bases (e.g., damage to their corporate
eBay, Amazon) images
4
4. Check-in software for Delays to passengers and loss of
airlines market share
5. Service Oriented Loss of functionality for any

Architectures (SOA) in application using this service
which web-based business
services offer general
services for use by other
applications
Following are the examples where poor software recoverability can pose significant
threats.
Sr. Type of Application Consequence of Poor

Recoverability
1. Safety-critical software that Exposure to safety risk (e.g.

must not fail while in aircraft crash).
operation (e.g., flight
control software).
2. Business applications that, Basic standby functionality

for example, make use of cannot be provided.
external systems and must
In the example, the online
provide at least basic
movie ticket reservation system
standby functionality
cannot accept unconfirmed
despite failure in those
reservations and will cause its
external systems.
business owner loss of revenue.
E.g., an online movie ticket

reservation system may
rely on an external
application for credit card
5
validation. If this fails, the
system must still be able to
accept reservations for later
confirmation.
3. Any application where System takes too much time to

downtimes must be restore to an agreed-upon level
minimized and could even of service following failure or
be regulated by Service planned downtime.
Level Agreements.
In the example, the system may
not have recovered after
E.g. a system for
scheduled night time
automatically collecting
maintenance by the time the
money from users of a rail
rush hour starts. The system
network.
owner loses money, the
operator may be fined for
breach of SLA, and the users
pay nothing.
4. Any application where data Data loss as a result of

backups are considered a scheduled or unplanned
necessity. E.g. an application downtime. In the
application used by sales example, the sales force may
force may need to regularly actually lose customer data
back up its customer (with a variety of consequences
database. according to what data was lost
for which customer).
iii) A system has two items A and B connected in series. The time-to-failure of item A
follows exponential distribution with parameter l = 0.002. The time to-failure of item B
follows Weibull distribution with parameter h = 760 and b = 1.7. Find the hazard rate of this
system at time t = 100 and t = 500. (7mks)
6
At time t=100hrs;
Item A;
h (t) =l=0.002
Item B;
h (t) =
h (t) =1.7/760*(100/760)1.7-1
h (t) =0.000206025
h (t)
CAT 2
2. i) Draw and describe the life cycle of a system? (10mks)
The life cycle of a system begins at the moment when an idea of a new system is born and
finishes when the system is safely disposed. It begins with the initial identification of the
needs and requirements and extends through planning, research, design, production,
evaluation, operation, maintenance, support and its ultimate phase out.
Manufacturers who specialize in military hardware will often be approached, either directly
or through an advertised “invitation to tender” to discuss the latest defense requirement. For
most other manufacturers, it is generally up to them to identify a (potential) market need and
decide
whether they can meet that need in a profitable way. The UK MoD approached BAE Systems
to bring together a consortium (including representatives of the MoD and RAF) for an air
system that would outperform all existing offensive systems, both friend and foe, and that
would include all of the concepts identified as practical in the URA research project. Airbus
Industries, on the other hand, decided, based on their extensive market research, that there
was a sufficient market need for a very large aircraft that could carry well in excess of 500
passengers, at least
across the Pacific from Tokyo to Los Angeles and possibly even non-stop between London
and Sydney. It will be many years before we will know whether either of these aircraft will
get off the ground and very much longer to see if they prove a business success for their
manufacturers.
The first process then is a set of tasks performed to identify the needs and requirements for a
new system and transform them into its technically meaningful definition. The main reason
for the need of a new system could be a new function to be performed (that is there is a new
market demand for a product with the specified function) or a deficiency of the present
system. The deficiencies could be in the form of:
1. Functional deficiencies
2. Inadequate performance
3. Inadequate attributes
4. Poor Reliability
5. High maintenance and support costs
6. Low sales figures and hence low profits.
The first step in the conceptual design phase is to analyse the functional need or deficiency
and translate it into a more specific set of qualitative and quantitative requirements. This
analysis would then lead to conceptual system design alternatives. The flow of the conceptual
system
design process is illustrated in Figure 1.3 (D Verma and J Knezevic, 1995).
The output from this stage is fed to the preliminary design stage. The conceptual design stage
is the best time for incorporating reliability, maintainability and supportability considerations.
In the case of FOAS, for example, various integrated project teams with representatives of the
users, suppliers and even academia will draw together to come up with new ideas and set
targets, however, impractical. It was largely a result of this activity that the concepts of the
MFOP and the uninhabited combat air vehicle (UCAV) were born.
ii) Explain the concept of failure? (3mks)
Failure has come to mean many things to many people. Essentially, a failure of a system is
any event or collection of events that causes the system to lose its functionability where
functionability is the inherent characteristic of a product related to its ability to perform a
specified function according to the specified requirements under the specified operating
conditions.
Thus a system, or indeed, any component within it, can only be in one of two states: state of
functioning or state of failure. In many cases, the transition between these states is effectively
instantaneous; a windscreen shatters, a tyre punctures, a blade breaks, a transistor blows.
There is insufficient time to detect the onset or prevent the consequences.
iii) Explain the Hazard function? (3mks)

Hazard function (or hazard rate) is used as a parameter for comparison of two different
designs in reliability theory. Hazard function is the indicator of the effect of ageing on the
reliability of the system. It quantifies the risk of failure as the age of the system increases.
Mathematically, it represents the conditional probability of failure in an interval t to t + dt
given that the system survives up to t, divided by dt, as dt tends to zero, that is,
iv) Discuss the Poisson Distribution (3mks)

The theoretical probability distribution which pairs the number of occurrences of an event in
a given time period with its probability is called the Poisson distribution. There are
experiments where it is not possible to observe a finite sequence of trials. Instead,
observations take place over a continuum, such as time.
The probability mass function in the case of the Poisson distribution for random variable X
can be expressed as follows:
 is the intensity of the process and represents the expected number of occurrences in a time
period of length t.
4. Explain the following terms:
i) Mean Time To Failure (MTTF) (3mks)
MTTF represents the expected value of a system's time to first failure. It is used as a measure
of reliability for non-repairable items such as bulb, microchips and many electronic circuits.
Mathematically, MTTF can be defined as:

MTTF=  tf(t ) dt R( t) dt

1 0
Thus, MTTF can be considered as the area under the curve represented by the reliability
function, R(t), between zero and infinity. If the item under consideration is repairable, then
the expression above represents mean time to first failure of the item
ii) System redundancy (3mks)
In systems, redundancy is a means of maintaining system integrity if critical parts of it fail. In
some cases this means replicating parts of the system, in others, alternatives are used. A
commercial aircraft has to be able to complete a take-off and landing with one of its engines
shutdown but, except under very special circumstances, no such aircraft would be allowed to
leave the departure gate if any of its engines are not functioning.
And yet, ETOPS, extended twin engine operations allows certified twin engine aircraft (e.g.
Boeing 777 and Airbus 330) to fly up to 180 minutes from a suitable landing site. This is
based on the probability that even if one of the engines fails that far from land, the other is
sufficiently, reliable to make the probability of not reaching a landing site an acceptable risk.
It should be noted that in normal flight, i.e. at cruising speed and altitude, the engines are
generally doing very little work and usually are throttled back.
iii) Maintenance and Maintainability (4mks)

Maintenance is the action necessary to sustain and restore the performance, reliability and
safety of the item. The main objective of maintenance is to assure the availability of the
system for use when required.
Maintainability is the scientific discipline that studies complexity, factors and resources
related to the maintenance tasks needed to be performed by the user in order to maintain the
functionality of a system, and works out methods for their quantification, assessment,
prediction and improvement.
iv) Maintenance Elapsed-Time (3mks)
The length of the elapsed time, required for the restoration of functionality, called time to
restore, is largely determined at an early stage of the design phase. The maintenance elapsed
time is influenced by the complexity of the maintenance task, accessibility of the items, safety
of the restoration, testability, physical location of the item, as well as the decisions related to
the requirements for the maintenance support resources (facilities, spares, tools, trained
personnel, etc.). It is therefore a function of the maintainability and supportability of a
system.
v) Factors to consider when designing for reliability: (7mks)

Concurrent Engineering
Concurrent engineering is a feature that ensures the design is not completed before reliability
requirements are identified and dealt with.
Configuration Design
The physical configuration is the key important characteristic that determines the reliability of
an asset. Depending on the severity of the product service and the maximum economic
reliability of available components present in the product, it may be necessary to build
redundancy into some locations.
Component Selection
The second important characteristic that determines reliability is the choice of components
that make the product. Components with better load bearing ability rather than cheap
components should be considered as better option.
Design and Build
It is possible to create a strong configuration and select robust components, and still produce
a product that is unreliable. There are design and assembly practices like use of protective
grommets at points of wear, use of strain relief at bends, or changes in direction that ensure
the configuration and components deliver the desired reliability.
Verification and Performance Testing
The final assembled product may not always perform as expected. Interactions between
dynamic components can produce unexpected effects. As a result, it is necessary to verify that
the assembled product functions as expected. It is also essential to simulate the wear and tear
that represents an entire life using accelerated testing.
Customer Needs
The product must be designed not only based on functionality but also considering the
customer needs.
5. i) Describe what is meant by availability? (2Mks)

Availability is used to measure the combined effect of reliability, maintenance and logistic
support on the operational effectiveness of the system. A system, which is in a state of failure,
is not beneficial to its owner; in fact, it is probably costing the owner money.
ii) Explain what Fault Tree Analysis is and further discuss the three main steps
involved when carrying out Fault tree analysis. (8MKs)
Fault tree analysis is a deductive approach involving graphical enumeration and analysis of
the different ways in which a particular system failure can occur, and the probability of its
occurrence. It starts with a top-level event (failure) and works backward to identify all the
possible causes and therefore the origins of that failure.
The following are the main steps involved when carrying out FTA:
1. Identify the top-level event - The most important step is to identify and define the top-level
event. It is necessary to specific in defining the top level event; a generic and non-specific
definition is likely to result in a broad based fault tree which might be lacking in focus.
2. Develop the initial fault tree - Once the top-level event has been satisfactorily identified, the
next step is to construct the initial causal hierarchy in the form of a fault tree. While
developing the fault tree all hidden failures must be considered and incorporated. For the sake
of consistency, a standard symbol is used to develop fault trees. While constructing a fault
tree it is important to break every branch down to a reasonable and consistent level of detail.
3. Analyse the Fault Tree - The third step in FTA is to analyse the initial fault tree developed.
The important steps in completing the analysis of a fault tree are:
 Delineate the minimum cut-sets
 Determine the reliability of the top-level event
 Review analysis output.
Identify Top Level Develop the Initial Analyse the Fault Delineate the
Event Fault Tree Tree Minimal Cut-sets
Review Analysis Determine Top-
Output Level Event
Reliability
iii) Discuss I n s p e c t i o n -Based ma i n t e n an ce p ol i cy , I B M , a n d gi v e i ts b en ef i ts

(5Mks)
The suitable maintenance policy for items for which their conditions are described by the
relevant condition indicator, RCI is inspection-based maintenance. Inspection is carried at
fixed intervals to determine whether the condition of the item, is satisfactory or unsatisfactory
according to the RCI.
Before the item/system is introduced into service the most suitable frequency of the
inspection, FMTI and critical value of relevant condition indicator or RCI cr has to be
determined. Once the critical level is reached, RCI (FMTI) > RCI cr the prescribed preventive
maintenance tasks take
place. If the item fails between inspections, corrective maintenance takes place.
BENEFITS OF IBM
1. Reduce unplanned downtime, since maintenance engineers can determine optimal
maintenance intervals through the condition of constituent items in the system. This allows
for better maintenance planning and more efficient use of resources.
2. Improve safety, since monitoring and detection of the deterioration in condition and/or
performance of an item/system will enable the user to stop the system (just) before a failure
occurs.
3. Extending the operating life of each individual item and therefore the coefficient of life
utilization will be increased compared to time based maintenance
4. Improve availability by being able to keep the system running longer and reducing the
repair time.
5. Reduce maintenance resources due to reduction in unnecessary maintenance activities
iv) Examination-Based maintenance policy, EBM, where conditional maintenance tasks in
the form of examinations are performed in accordance with the monitored condition of
the item/system, until the execution of a preventive maintenance task is needed or a
failure occurs. (5Mks)
The advantages of the examination-based maintenance policy are:

1. Fuller utilization of the functional life of each individual system than in case of time -based
maintenance
2. Provision of the required reliability level of each individual system as in case of time-based
maintenance
3. Reduction of the total maintenance cost as a result of extending the realizable operating life
of the system and provision of a plan for maintenance tasks from the point of view of logistic
support
4. Increased availability of the item by a reduction of the number of inspections in
comparison with inspection-based maintenance.
5. Applicability to all engineering systems. The main difficulties are the selection of a
relevant condition predictor and the determination of the mathematical description of the RCP
(l).

Reliability CATS 2017

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Reliability CATS 2017

Hochgeladen von

Copyright:

Verfügbare Formate

CAT 1

Reliability Engineering encompasses principles and practices associated with reliability

ii) Identify and explain four objectives of reliability engineering? (8mks)

 To apply engineering knowledge to prevent or reduce the likelihood or frequency of

iv) Explain the following types of redundancy

a) Cold Standby System (2Mks)

b) Warm Standby System (2Mks)

c) Hot Standby System (2MKs)

2. i) Discuss the main reasons why failure occurs? (7mks)

ii) Explain any six repercussions of poor reliability? (6mks)

Sr. Type of Application Consequence of Poor

1. Control software for Exposure to the risk of

2. Software for military Risk to a country's defenses

3. Online systems with Considerable financial loss and

5. Service Oriented Loss of functionality for any

Sr. Type of Application Consequence of Poor

1. Safety-critical software that Exposure to safety risk (e.g.

2. Business applications that, Basic standby functionality

E.g., an online movie ticket

3. Any application where System takes too much time to

4. Any application where data Data loss as a result of

iii) Explain the Hazard function? (3mks)

iv) Discuss the Poisson Distribution (3mks)

MTTF=  tf(t ) dt R( t) dt

iii) Maintenance and Maintainability (4mks)

v) Factors to consider when designing for reliability: (7mks)

5. i) Describe what is meant by availability? (2Mks)

iii) Discuss I n s p e c t i o n -Based ma i n t e n an ce p ol i cy , I B M , a n d gi v e i ts b en ef i ts

The advantages of the examination-based maintenance policy are:

Das könnte Ihnen auch gefallen