Sie sind auf Seite 1von 25

Fault Tree Analysis in Product Reliability Improvement

Milena Krasich, P.E.

Milena Krasich, PE; Bose Corporation; MS 450; The Mountain, Framingham MA 01701-7330 USA e-mail: milena_krasich@bose.com.

2002 Annual RELIABILITY and MAINTAINABILITY Symposium

SUMMARY & PURPOSE This tutorial introduces the use of a well known technique of the Fault Tree Analysis as a tool in reliability modeling and analysis of an electronics of mechanical design (including software), identification of potential failure modes that are high contributors to unreliability, tradeoffs and mitigation of those failure modes. Applied early in product the design phase, this activity allows for relatively inexpensive and easy design and manufacturing process improvements and, in that manner, achieving considerable improvement of the product reliability before the design is completed or the product is manufactured. A real example of this analysis as applied to audio products are discussed along with the achieved reliability improvement.

Milena Krasich
Milena Krasich is the Senior Technical Lead of Reliability Engineering in Design Assurance Engineering of Bose Corporation. Before joining Bose, she was a Member of Technical Staff in the Reliability Engineering Group of General Dynamics Advanced Technology Systems formerly Lucent Technologies, and prior to that, she worked for the Jet Propulsion Laboratory in Pasadena, California. While in California, she was a part-time professor at the California State University Dominguez Hills, where she taught graduate courses in System Reliability, Advanced Reliability and Maintainability, and Statistical Process Control. At that time, she was also a part-time professor at the California State Polytechnic University, Pomona, teaching undergraduate courses in Engineering Statistics, Reliability, Environmental Testing, Production Systems Design, Measurements, and Materials Procurement. She holds a BS and MS in Electrical Engineering from the University of Belgrade, Yugoslavia, and is a California registered professional electrical engineer. She is also a member of the IEEE and ASQC Reliability Society, a Fellow and the past president of the Institute of Environmental Sciences and Technology, and a member of the College of Fellows of the Institute for Advancement of Engineering. Currently, she is a US Delegate to the International Electrotechnical Committee, IEC, working on dependability/Reliability standards and is a project leader for revision of international standards for reliability growth.

Table of Contents
1. 1.1 2. 2.1 3. 3.1 3.2 3.3 3.4 3.5 4. 5. 6. 7. INTRODUCTION.......................................................................................................................................... 1 Notation and Acronyms ................................................................................................................................. 1 Reliability Improvement................................................................................................................................. 1 Reliability Definitions Related to This Tutorial............................................................................................. 2 Fault Tree Analysis and Its Use ..................................................................................................................... 2 Fault Tree Introduction ............................................................................................................................... 2 System Analysis Methodology....................................................................................................................... 2 Building of a Fault Tree ................................................................................................................................. 6 Contribution of Manufacturing Defects ......................................................................................................... 8 Origin of Values for the Basic Events............................................................................................................ 9 Failure Mode Detection and Mitigation ......................................................................................................... 9 Summary and Conclusions........................................................................................................................... 12 References and Bibliography ....................................................................................................................... 12 Attachment -Tutorial Visuals ....................................................................................................................... 13

2002 Annual RELIABILITY and MAINTAINABILITY Symposium

ii

1.

INTRODUCTION

Multiple methods have been used for the estimation of product reliability for many decades that reliability has been applied as a science. Many reasons, such as product criticality (medical devices, defense systems, transportation) or the need for competitiveness in consumer industry, dictate the need for products with remarkably high reliability. Design alone, regardless of its features and technology, does not guarantee products reliability. A design team, conscious of good and reliable design methods such as proper component derating, ESD and EMI protection, may not be completely aware of all of the aspects of reliability modeling and potential reliability shortfalls. This is especially the case when a product must be designed to operate in multiple environments, or the specifics of component reliability aspects (such as dependency of their reliability on applied stresses) are not well understood. Therefore reliability of a completed design may not be as required or as expected. In the past, attempts to improve product reliability were concentrated on various types of the Failure Mode and Effects Analyses (FMEA), and/or on the dedicated Reliability Growth test programs. Both of those methods applied individually or in conjunction, even though useful, may not be cost effective or applicable. The first method, FMEA, is a valuable but a very comprehensive attempt to identify the potential failure modes and to assure their mitigation. Starting from the bottom and going up, the analysis addresses each component (electrical or mechanical), the modes in which it might fail, and the effects that those failure modes might have on higher level assemblies and the system. The process is very tedious and is often completed well after the design is finished and the production period has begun. This might be too late to accomplish any measurable improvements without major expenses for redesign, new PC boards layouts, and new tooling. In addition, any type of FMEA normally does not produce the measure of overall product reliability, thus any achieved reliability improvement is also not measurable. One type of a FMEA has a Risk Priority Number (RPN) associated with it; however, this number is a product of three numbers (from 1 to 10) assigned each, Severity, Occurrence, and Detection. Regardless of strict rules applied in estimation of these numbers, those are still only estimations, and thus might be subjective. Another FMEA type that includes criticality computation (FMECA) requires knowledge of failure rates; therefore, it cannot be applied for analysis of systems with components where failure probabilities, not failure rates are a far better attribute. Those also do not provide reliability estimates. Test methods for reliability improvement are even more costly keeping in mind that those were performed on preproduction or production runs, meaning that the design is mature. In addition, the test units might be complex and expensive so that only a limited number might be available for testing. Fault Tree Analysis combines many favorable aspects:

It is timely, therefore. low cost It is fast and easy to use It provides realistic reliability estimates at the same time with the failure mode analysis It measures achieved reliability improvement and the final reliability of a product.

1.1 NOTATION AND ACRONYMS (t) - Component failure rate, instantaneous failure rate Component failure rate if assumed constant assumed to be constant. ESD - Electrostatic Discharge EMI - Electromagnetic Interference FTA - Fault Tree Analysis FMEA - Failure Mode and Effects Analysis FMECA - Failure Mode Effects and Criticality Analysis RPN - Risk Priority Number MTTF - Mean Time to Failure MTBF - Mean Time Between Failures IEC - International Electrotechnical Commission Q(t) - Unreliability as a function of time Q - Unreliability assumed constant or calculated for a predetermined time Pr - Probability Pr(c) - Probability of occurrence of a cut set FET - Field Effect Transistor IC - Integrated Circuit R - Reliability F- Probability of failure unreliability CODEC - Coder/Decoder PRF - Part Random Failure PCB - Printed circuit board IEV International Electrotechnical Vocabulary

2.

RELIABILITY IMPROVEMENT

Reliability improvement can be undertaken and achieved in different phases of the product life: Design phase Product validation phase test reliability growth During its fielded life The first option, design phase, offers the most cost effective opportunities for product reliability improvement. Before design is finalized, even considerably involved changes do not pose a great expense, other than design time. If design improvements are not excessively extensive, necessary changes can often be painlessly done. Then the rest of product preparation (such as layout of printed circuit boards, tooling, component procurement) can be done without interruption or modifications. In the design phase, reliability improvements are achieved by identification of potential design deficiencies or potential manufacturing problems/defects that may compromise reliability of a design. Some potential design flaws that are likely to be identified are as follows: 1

2002 Annual RELIABILITY and MAINTAINABILITY Symposium

Electrical or mechanical overstress of components Components inadequate to be used in that design (unreliable or improperly used) Potential relationship between failures, that is, secondary failures caused by occurrence of another failure or by the presence of an environmental stress Parts of inferior quality (reliability) as built by their respective manufacturers.

Capacitor fails short due to crack propagation Resistor fails open due to the poor welding of the connections FET saturates and overheats Seal leaks, etc. One failure mode can have multiple causes. Examples of those are: IC enclosure fails due to one or more of the following: high humidity high temperature thermal cycling IC manufacturing process Capacitor short: electrical overstress high temperature, use or soldering vehicle vibration A seal in underwater cable connector may leak due to: water pressure causing dilatation of the material cold temperature wearout from mating and de-mating of the connector defect in manufacturing undersize

2.1 RELIABILITY DEFINITIONS RELATED TO THIS TUTORIAL To assure proper understanding of the terms as they are used in this tutorial, some reliability definitions are included. These are as follows: Reliability probability that an item can perform a required function under given conditions for a given time interval (IEV 191-12-01). Here, the required function is defined by expected performance that may vary depending on the use of the item and of the expectations. For a high-fidelity stereo audio/video product, the expectations are, for example, no audible noise or distortion. For a mechanical device, a pipe or an underwater connector housing, the expected performance would be that there is no bending greater than a predefined angle under some expected force. The measures for reliability or its complement, unreliability, would be probability of survival past the end of a predetermined period, or probability of failure before the end of a predetermined period, respectively. The measurement that is best understood by management is the percent of items surviving a time period (life or warranty). Failure the termination of the ability of an item to perform a required function (IEV: 191-04-01). A failure can be classified as a failure of the hardware to operate properly due to: Design failure a failure due to the inadequate design of an item to withstand operational and/or environmental stresses, or due to the use of an improper part Manufacturing defect causing time-related failures that compromise design reliability

3.

FAULT TREE ANALYSIS AND ITS USE

A fault tree is used as a Boolean representation of a product design; a system, its assemblies and functions, failure modes, and their respective causes. Fault tree analysis in analysis of a design has a multiple mission. One of its applications is for modeling of the products architecture and functionality in a top down manner, searching for potential failure modes and their causes that might produce an unfavorable outcome defined as a product failure. It also estimates quantitatively reliability of an item and its assemblies. Based on this information, one can identify those failure modes that are the highest contributors to the products unreliability, follow the investigation down to identify their respective causes. This allows for tradeoff and mitigation of those potential failure modes, and finally, evaluation of the achieved reliability improvement. 3.1 FAULT TREE INTRODUCTION Fault tree is a logic diagram that represents functional dependencies of parts of a system. The top gate represents the unfavorable outcome of the system, and all other unfavorable outcomes that contribute to the system failure are represented as gates, logically connected to the top gate. Components of a fault tree are: Gates, which are outcomes of one or a combination of input events or other gates

Software interactions with hardware A failure of an item can also be attributed to a fault in the software code a failure of the software design. Failure Cause the circumstances during design, manufacture, or use which have led to a failure (IEV:191-0401) Failure Mechanism the physical, chemical, or other process which led to a failure. An example would be crack propagation through the dielectric of a ceramic capacitor causing the capacitor to develop a small resistance and ultimately a short circuit. Failure Mode manner or state in which an item or a component might fail. Examples of failure modes are: Low or no output from an IC Separation of the IC packaging material

Cut sets, which are groups of outcomes or events that, if occurred, would cause a system failure. Minimal cut set contains the minimum number of events that are required for a failure outcome. The removal of one of them would result in a system surviving. Types of events and 2

2002 Annual RELIABILITY and MAINTAINABILITY Symposium

gates along with their definitions and graphical

representations are shown in Table 3.1.

Table 3.1. Graphical Representation and Definitions of Gates and Events FTA Symbol Symbol Name BASIC EVENT CONDITIONAL EVENT Description Basic event for which reliability information is available Event that is a condition of occurrence of another event when both must occur for the output to occur A basic event that represents a dormant failure A part of the system that yet has to be developed - defined Gate indicating that this part of the system is developed in another part or page of the diagram This output event occurs if any of its input event occur This output occurs if m of the inputs occur The output event takes place if one, but not the other input occur The output event takes place if all of the input events occur The output event (failure) occurs only if the input events occur in sequence from left to right The output occurs only if both of the input events take place, one of them conditional The outcome is present only if the input event does not occur Reliability Model Component failure mode, or a failure mode cause Occurrence of event that must occur for another event to occur Conditional probability Dormant component failure mode or dormant failure cause A contributor to the probability of failure. Structure of that system part is not yet defined A partial reliability block diagram that is shown in other location of the overall system Failure occurs if any of the parts of that system fails - series system Redundancy k out of n, where m = n-k+1 A failure of the system occurring only if one, not both of the two possible failures happens Parallel redundancy, one out of n equal or different branches. Good for representation of secondary failures or for enabling sequence of events Conditional probability occurrence of the final event of Inputs 0 0

DORMANT EVENT UNDEVELOPED EVENT TRANSFER GATE OR GATE

0 0

MAJORITY VOTE GATE EXCLUSIVE OR

AND GATE

PRIORITY AND

INHIBIT GATE

NOT GATE

Exclusive events or preventive measure does not take place

2002 Annual RELIABILITY and MAINTAINABILITY Symposium

3.2 SYSTEM ANALYSIS METHODOLOGY 3.2.1 Classical System Reliability Analysis When a system is complex regarding the complexity of its modeling, that is, if it contains many of interlocked or common branches, standard modeling can become extensively cumbersome, lengthy, and subject to mathematical (computational) errors. An example of a simple, yet complex bridge circuit is shown in Figure 3.2-1.

Blocks 4 and 5 (c2 = 4,5) Blocks 1, 3, and 5 (c3 = 1,3,5) Blocks 2, 3, and 4 (c4 = 2,3,4) Should any of the above combinations fail, the signal flow from A to be will be interrupted. With Boolean algebra, the probability of the system failure would be: FS = Pr (c1 c 2 c 3 c 4 ) Probability of the cut set 1 is: Pr( c1 ) = F1 F2 = (1 R 1 ) (1 R 2 ) The correct calculation (Esary-Proschan) is then:
1 [1 Pr(c1 )] [1 Pr(c 2 )] [1 Pr( c 3 )] [1 Pr( c 4 )] With RARE event approximation; this calculation would be: FS = Pr( c1 ) + Pr( c 2 ) + Pr( c 3 ) + Pr( c 4 ) FS = F1 F2 + F4 F5 + F1 F3 F5 + F2 F3 F5 While easy to implement, RARE approximation may introduce sizeable errors into calculations when the failure probabilities are larger numbers. Anything larger than a multiple of 10-2 as a value of a failure probability will produce an unwanted error. This is shown on the example below: F1 = 2 10 2
F2 = 5 10 2 F3 = 8 10 2 F4 = 2.5 10 2 F5 = 3 10 1 Esary Pr oshan : FS = 9.068 10 3 RARE : FSR = F1 F2 + F4 F5 + F1 F3 F5 + F2 F3 F4 FSR = 9.08 10 3 FS = 1 (1 F1 F2 ) (1 F4 F5 ) (1 F1 F3 F5 ) (1 F2 F3 F4 )

FS = Pr (c1 c 2 c 3 c 4 ) =

A
3

Figure 3.2-1. Bridge Circuit In the bridge circuit above, the signal must flow from input A to output B. It can flow through block 3 in both directions. Analytical solution would be to model the system under two circumstances, assuming that the block 3 is good, in which case the signal would flow through blocks 1 or 2 and 4 or 5, as if they were parallel blocks, or assuming that the block 3 is bad (the condition that 3 failed), in which we have blocks 1 and 4 in series, parallel to blocks 2 and 5 also in series. This would be represented with the following equation:
R s = (R1 + R 2 R1 R 2 ) (R 4 + R5 R 4 R5 ) R3

[R1 R 4 + R2 R5 R1 R2 R 4 R5 ] (1 R3 )

R s = 0.991

When a system contains a multitude of complex systems of different kinds, the algebraic representation becomes rapidly too involved and cumbersome to solve. In addition, these complex equations need to contain a multitude of conditional probabilities to account for environmental effects and secondary failures. This only adds to already extensive complexity of the calculations. 3.2.2 System Reliability Analysis Using a Fault Tree The complex system shown in Figure 3.2-1 can be easily modeled using Boolean algebra with fault tree or success tree representation. Cut sets in this system would be made of the following combinations: Blocks 1 and 2 (c1 = 1,2)

Software packages commercially available for FTA are based on Boolean algebra, and most of them contain the constant failure rate model for unavailability:
Q( t ) = 1 e ( + )t +

If the time to repair (MTTR) is considered infinite (nonrepairable items), then = 0, and: Q(t) = F(t) Other information that can be obtained with FTA software is: Failure frequency (hazard rate) of all gates

2002 Annual RELIABILITY and MAINTAINABILITY Symposium

Number of expected failures during the predetermined time Unavailability or probability of failure of the system at any gate Gate summaries in various forms Confidence intervals Sensitivity analysis

Calculations using distributions other than exponential The circuit from Figure 3.2-1 represented by FTA is shown n Figure 3.2-2

Figure 3.22. FTA Diagram of the Bridge Circuit Different gates of a fault tree are used to represent different circuit models as shown in the following examples: Example 1: Combination of series and redundant blocks (events) Reliability block diagram of this combination is shown in Figure 3.2-3

The corresponding equations are as follows: F1 = 0.002, F2 = 0.0005, F3 = 0.0032, n = 3, m = 2


FGate1 = 1 (1 F1 ) (1 F2 ) FGate2 =

i!(n i)! (1 F )
3

n!

F3 ( n i )

2 out

FTopGate = 1 (1 FGate1 ) 1 Fgate2 FTopGate = 2.53 10 3

)]

F3 F1 F2 Gate 1 Top Gate F3 F3 Gate 2

The FTA representation of the reliability block diagram in Figure 3.2-3 is shown in Figure 3.2-4

Figure 3.2-3 Series Parallel Circuit Configuration

2002 Annual RELIABILITY and MAINTAINABILITY Symposium

F1 = 0.002, F2 = 0.0005, F3 = 0.00045, F4 = 0.00053, F5 = 0.0032


FGate1 = F1 F2 FGate2 = F3 F4 + F3 F5 + F4 F5 FTopGate = [1 (1 FGate1 ) [1 FGate2 ]] FTopGate = 4.374 10 6

Figure 3.2-4. FTA Representation of a Series-Parallel Reliability Block Diagram With different redundant blocks (Figures 3.2-3 and 3.2.4) the redundant gates are different, F3, F4, and F5 instead of the repeated F3 and the calculations are done in a similar way (binomial). The three different redundant blocks are shown with the Example 2 of the conditional probability, where Gate 2 has three different gates representing the three redundant blocks (Figure 3.2-5). Example 2: Use of a priority gate. The event F2 will occur only if the event F1 has occurred (conditional probability). The equivalent fault tree is shown in Figure 3.25.

Example 3: A real life example of a priority gate is the analysis of a switching amplifier, where on all four outputs (+1, - 1, +2, and -2) there are four switching FETs, followed by noise and EMI filtering. For the FETs to operate properly (in the switching mode) the Logic Ground (LGround) must be maintained at a certain voltage. This voltage is 5V, maintained by a voltage regulator filtered by two ceramic capacitors. Should LGround voltage decrease below 2V, the FETs will start operating in linear mode and will then saturate. This condition not only constitutes a failure, but could eventually cause the FET to overheat. This voltage would decrease in the event that one of the voltage filtering capacitors developed a small resistance close to a short. Here, the Lground below 2V is the condition for FETs to saturate and overheat. In the old design, voltage-filtering capacitors had a dielectric with Y5V characteristics, which has a higher concentration of voids and could develop and propagate a crack easier than other ceramics (especially in harsher environments as the one that this analysis was performed for). This characteristic, along with the less than adequate voltage rating contributed to a relatively high projected probability of failure for the specified lifetime. Replacement of both of the voltage filtering capacitors with those having a dielectric with X7R characteristics and a higher voltage rating, the 10-year probability of occurrence of FET overheat was reduced from 2.0969E-3 (per FET) to 1.0009E-4, which was an improvement by a factor of 20. The original circuit, as modeled with the fault tree is shown in Figure 3.2-6.

Figure 3.2 5 Example of a Priority Gate (Gate 1) The associated mathematics is as follows: n= 3, m = 2

2002 Annual RELIABILITY and MAINTAINABILITY Symposium

Overheat of FET due to LGND <2V


IE

Page 1 Q=2.0969e-3

FET4 OVERHEAT

Ceramic capacitor shorts to ground


IE

LGND shorted to ground causing improper FET bias and overheat


I E

Electrolyte mixed with debris causing making a short


I E

FET saturates due to LGND <2V


I E

Short C905 Q=3.0001e-3

LGROUND Q=5.9911e-3

FET 4 SATURATION

DANDREIC SHORT

Q=4.6875e-13 Q=3.5000e-1 Capacitor shorts brings Lground to the ground


IE

Manufactirung defects cause a short


I E

Capacitor shorts due to part random f ailure


IE PRF_SHORT_C6905

Manufac tirung defects cause a short


IE

Short caused by leaking of the nearby capacitor


I E

resence of debris on the board


I E

MFG_SHORT_C905

MFG_Short_C906

Short_C906 Q=3.0001e-3

Q=7.0000e-8 Q=3.0000e-3
Excessive solder causing a short between the pins or pads
IE SOLDER SHORT_C6905

Q=7.0000e-8

EL. CAP LEAK Q=3.1250e-6

DEB RI S Q= 1.5000e-7 Electrolyte leak due to high humidity


IE

Debris on the PCB causing a sho rt


I E

Excessive solder causing a short between the pins or pads


IE SOLDER SHORT_C6906

Debris on the Capaci tor fails due to part PCB causing a random failure short
IE IE

Capacito leaking Electroly te Leak due to High electroly te due Temperature to ageing
IE I E

DEBRIS_C6905

DEBRIS_C6906

PRF_C6906 Q=3.0000e-3

AGEING Q=1.0000e-6

HI-TEMP Q=1.2500e-7

HI_HUMIDITY Q=2.0000e-6

Q=5.0000e-8

Q=2 .0000e-8

Q=5.0000e-8

Q=2.0000e-8

Figure 3.2-6. Practical Example of a Priority Gate Example 4. Use of an inhibit gate is shown in Figure 3.2-7. With the inhibit gate, for the outcome to constitute a failure, all of the input events (in our case three) must take place. A practical example of this modeling would be the connection of three EMI filtering capacitors. If a failure mode is defined as no filtering, all of the three would have to fail.

Figure 3.2-7. Example of an Inhibit Gate

2002 Annual RELIABILITY and MAINTAINABILITY Symposium

3.3 . BUILDING OF A FAULT TREE To build a fault tree of a product (a system made of subsystems, assemblies, and components) is a top down process where, as a first step, one must define what constitutes the failure of that product. For a high quality audio amplifier, anything that the end user might hear and qualify as degraded performance constitutes the system failure. The next step is to outline the system architecture and the major functions such as: Power supply Video amplifier

High temperature A detailed example of how a fault tree analysis is done is shown in another real life example, an analog input to an analog to digital converter of an audio amplifier. The partial circuit of this amplifier is shown in Figure 3.3 1. This part of the amplifier is normally known as CODEC, as analog input signals are converted into a digital, and then again into linear output. The signals are directed into an IC that is an analog to digital converter. For the amplifier to be operational, all signals have to be processed by CODEC meaning that is they have to coded and decoded. The inputs signal 1+ into the left channel of IC U20 is interrupted if: R200, R209, or C171 fail open C179 shorts to ground, shorting the signal to ground The input signal 2+ into the right channel of the U20 is interrupted if: R201, R205, or C172 fail open C177 shorts the signal to ground The entire circuit will not work if no voltage is supplied to the analog input, (pin 8) R206 or R208 fail open, interrupting the supply of 2.3 V

Audio amplifier The further analysis going down determines what phenomena preclude proper operability of those parts or functions, i. e: Shorted line voltage or no VCC supplied No video processing

One or more audio channels not operational More detailed analyses further determine the causes of those phenomena, contributing factors, down to the causes of failure modes such as: Electrical overstress High humidity

Figure 3.3-1 Input into CODEC of an Audio Amplifier

2002 Annual RELIABILITY and MAINTAINABILITY Symposium

The signal will be too noisy if C183 fails open (low frequency noise), or C181 fails open (high frequency noise). Other contributors to the failure are the lack of data inputs, which will not be considered in this example. The top level of the FTA representation of this analysis is shown in Figure 3.32.

Circled, in Figure 3.3.-2 is the gate that needs to be developed for the analog inputs 1 and 2 described earlier. Figure 3.3-3 shows further development of that gate.

Figure 3.3-2 Top Level FTA of CODEC

Figure 3.3-3 Development of the FTA for inputs 1 and 2

2002 Annual RELIABILITY and MAINTAINABILITY Symposium

Inputs 1 and 2 are then separately analyzed, and so are the noisy or no analog voltages. Development of Input 1 is shown in Figure 3.3-4. The circle points out the open components that are to be further developed. The fault tree part in Figure 3.3-4 also contains a gate that points to the possible lack of the 2.3V voltage. Capacitor C179, if failed short, would short the signal to the ground. There are two possible reasons for this capacitor to fail short. One is so called part random failure. This term takes into

consideration the environment that the capacitor is supposed to be exposed to (temperature, vibration) as well as the operational stresses that the capacitor will see, such as its operating voltage. Thus, the term random failure actually is not just a failure that will occur at random, but it describes the likelihood that a part will fail, if having an intrinsic defect, under given environmental and operational stresses.

Figure 3.3-4. Development of the FTA Down to Components and Their Failure Cause

3.4 CONTRIBUTION OF MANUFACTURING DEFECTS Manufacturing defects causing time dependent failures are a vital contributor to product unreliability. Some contributions to components failing open are: Cold or insufficient solder, which after a period of time, due to relaxation and fatigue, causes connections to open. Vibration of a vehicle will cause the cold soldered joint to open as well.

Missing components Components cracked during insertion

Broken or bent pins or leads Contributors of manufacturing flaws to components failing short are: Debris (at times un-cleaned flux) left on the board Excessive solder

2002 Annual RELIABILITY and MAINTAINABILITY Symposium

Bent pins (mostly ICs and connectors) shorting to another pin. Another reason for the capacitor failure (Figure 3.3-4) would be a failure (a short) caused by manufacturing defects. Normally during production, if a PC board is not properly cleaned, debris left on it will produce so called dandreic growth, which, in turn might cause a short between terminals. A second manufacturing defect causing an electrical short is a result of inadequate soldering technique, where excessive solder develops a bridge between the terminals and cause a short. Further development of the fault tree will point out to other components failing open or short causing failure of the analog power supply, or interruption of the second signal. 3.5 ORIGIN OF VALUES FOR THE BASIC EVENTS To be able to estimate the final (top gate) product reliability, each of the events must have information on its reliability assigned to it. This information may be attached in the form of a failure rate, MTTF, or probability of failure. For mixtures of hardware, mechanical and electrical, perhaps the most straight forward way would be to represent all the information in the form of a probability of failure calculated for a predetermined time, and a predetermined operational profile. For electrical components, data for event and failure mode probabilities comes from: Information from the manufacturers life testing, which needs to be recalculated for the proper environmental and electrical stresses Software databases (commercially available) Field use field failure data information, which would be the very last resort because of many inconsistencies of data reporting and recording.

For mechanical components, probability of failure needs to be calculated based on: Stresses loads, and their geometry and distribution Materials Construction (design) of parts, such as shape and size

Attachment of parts to other structures (adhesives, fasteners) Based on all the information, the safety margin needs to be calculated, which in turn will produce a reliability value. For determination of a probability of occurrence of manufacturing defects, the approach may be two-fold. The probability associated with the manufacturing defects can come from factory or service data (field failure data). On the other hand, sometimes it is advisable to fill in the requirements numbers into the reliability analysis, and then adjust the manufacturing process control to achieve this goal.

4.

FAILURE MODE DETECTION AND MITIGATION

In a completed or in a partially completed fault tree analysis of a system, when the probability of failure of the top level gate is calculated and it is concluded that reliability improvement is necessary, the process that follows is to identify the highest contributor to unreliability (a failure mode or a cause) and improve the design. This process continues in search for the next highest contributor. An example of such reliability improvement is shown in the case of a complex audio/video amplifier system. The top level of the system (the console) is shown in Figure 4 1. The Tuner is shown as an event because of the repeated reference designator numbers in the bill of material of the system, and the tuner. For that reason, the Tuner was analyzed separately, and then its top unreliability is depicted as an event.

Figure 41 Top Level Fault Tree of Console and its Major Subsystems

2002 Annual RELIABILITY and MAINTAINABILITY Symposium

For the given warranty period, the original unreliability value is not acceptable, as 7,365 systems out of 100,000 made would need service before the end of their respective warranty periods. The highest contributor to unreliability is the block marked SPFIF. This gate was developed on page 13, as it is shown in Figure 4-2.

Figure 42 SPDIF Top Level fault Tree. Looking for the highest contributor to the SPDIF circuit unreliability is shown as a part of the circuit that is an input or output from the multiplexer. Further investigation leads to the SPDIF multiplexer, where the highest contributor is the IC U501 (Figure 4-3). The high failure probability of this IC is related to its construction packaging (TSSOP). In another package, SOIC, this IC is a reliable part. There were 3 of these units in the console. It also was apparent that the probability of failure of capacitors C513 and C517 was too high for ceramic capacitors. This is because those had the Y5V material dielectric characteristic. There were about 116 capacitors of this type in the console.

2002 Annual RELIABILITY and MAINTAINABILITY Symposium

10

Figure 4-3 The Components which were the Highest Contributors to the Console Unreliability. Once the design improvements were made, the console reliability was improved to the point of almost meeting its aggressive goal. The resultant improvement is shown in Figure 4-4.
1

0.99

Console goal R(1 year) = 0.992


0.98

Transistors and FETs from a more reliable vendor Planned Reliability Growth TSSOPs replaced by SOICs

0.97

Console Reliability

0.96

Achieved Reliability Growth


0.95

0.94

0.93

0.92

Y5V caps replaced by X7R (116) Initialy calculated

0.91 0 50 100 150 200 250 300

Duration of the design period (days)

Figure 4-4 Console Reliability Goal, Planned Growth Curve, and the Actual Reliability

2002 Annual RELIABILITY and MAINTAINABILITY Symposium

11

5.

SUMMARY AND CONCLUSIONS

The Fault Tree Analysis can be successfully used for identification and mitigation of potential failure modes that contribute to unreliability of a product. The FTA allows pictorial representation of the system, its architecture and functionality, along with using Boolean algebra and the multitude of modeling schemes to best represent the system operation and interdependency of its failure modes. The FTA is here used to evaluate the individual failure mode contributions to the system unreliability and come up with the most viable solution for its reliability improvement. The methodology can be summarized as follows: Define what constitutes the system failure Start with the top level of the system with an unfavorable outcome that defines the system failure Construct the fault tree down, using logic to express reliability modeling techniques Follow the analysis down the fault tree to determine what assembly, signal, part, or manufacturing defect will cause a particular failure Develop the fault tree all the way down to the causes of pertinent failure modes Determine respective probability of occurrence of individual causes. The software, when used for analysis, will roll up all information producing the system, subsystem, and assemblies failure probability Identify those failure modes that are the highest contributors to unreliability and mitigate.

and vote. The second is also a draft, in circulation for comments.

6.
1.

REFERENCES AND BIBLIOGRAPHY

Joanne Bechta Dugan, Fault-Tree Analysis of Computer-Based Systems 1999 Tutorial Notes, Reliability and Maintainability Symposium, Washington, DC Kiran Kumar Vemuri and Joanne Bechta Dugan, Reliability Analysis of Complex Hardware-Software Systems, Proceedings, Annual Reliability and Maintainability Symposium, January 1999, Washington, DC. Gza Szab and Pter Gspr, Practical treatment Methods for Adaptive Components in the Fault-Tree Analysis, Proceedings, Annual Reliability and Maintainability Symposium, January 1999, Washington, DC. Alfredo H-S. Ang and Wilson H. Tang Probability Concepts in Engineering Planning and Design, Volume II, Decision Risk and Reliability, 1990. Milena Krasich, Use of fault Tree Analysis for Evaluation of System Reliability Improvements in Design Phase. Proceedings, Annual Reliability and Maintainability Symposium, January 2000, Los Angeles, California

2.

3.

4.

5.

Update the analysis, and monitor the resultant reliability improvement Failure mode analysis with fault trees can be started with the start of a project, and updated as more detailed information becomes available. There is no need to come up with the failure rates as a reliability measure for all components, electrical, mechanical, and software. The fault tree modeling allows a mixture of various information (failure probability, different failure distributions), and does not require estimation of failure rates only like the classical reliability predictions do. Modeling and reliability assessment of a product system with the fault tree analysis allows for timely design improvements while design changes are still possible, feasible and inexpensive. This methodology is also described in the draft IEC standards, IEC 60300 1, Dependability management. Part 3: Application guide, Section Section 1: Analysis techniques for dependability; Guide on methodology, and IEC61014, Reliability growth methods. The first standard is in its last draft for comments

2002 Annual RELIABILITY and MAINTAINABILITY Symposium

12

7.

ATTACHMENT -TUTORIAL VISUALS


Reliability Definition and Considerations
Reliability Definition (IEV 191-12-01) Probability that an item can perform a required function under given conditions for a given time interval Required function: defined by the expected performance, i. e.

Fault Tree Analysis for Product Reliability Improvement


Milena Krasich Bose Corporation January 23, 2002

No audible noise No distortion No bending pass the predetermined angle

Measures Reliability: Probability of survival after the end of a predetermined period Unreliability: Probability of failure before the end of the period Measure as management sees it: Percent of items surviving a predetermined time period normally warranty period, mission period or other time period requiring proper product operation
1-23-2002 M. Krasich 4

Definition of Failure IEV: 191-04-01 Tutorial Content


General reliability definitions in accordance with: IEC 60050(IEV 191 191) (1990), International Electrotechnical Vocabulary, Chapter 191: Dependability and quality of service Description of Fault Tree Analysis methodology Mathematics (statistics) associated with the Fault Tree Analysis Reliability modeling of a complex system using Fault Tree Analysis (FTA), in accordance with: IEC 60300-3-1, Dependability Analysis Methods Examples of how the FTA is used for reliability improvement of electronics Methods for determination of failure probabilities for basic events Failure mode mitigation and reliability growth/improvement a real life example
The termination of the ability of an item to perform a required function Failure of hardware to operate properly due to:
Design failure: A failure due to inadequate design of an item (to withstand operational or environmental stresses) -- improper part or improper use of part in design Manufacturing defect causing time - related failures A fault due to non-conformity during manufacture to the design of an item or to specified manufacturing processes Software failures

Failure of software Failure Cause The circumstances during design, manufacture, or use which have led to failure Failure Mechanism The physical, chemical, or other process which led to a failure
1-23-2002 M. Krasich 5

1-23-2002

M. Krasich

Definition of Failure Mode Reliability Growth - Improvement


Reliability improvement of a product can be achieved in various phases of its life:
Design phase Test, product validation phase test reliability growth Fielded life by upgrades, derivatives, recalls, etc.

The most cost effective reliability improvement done during the product design Product reliability improvement achieved by: Identification of potential design flaws:
Component electrical overstress Potential mechanical overstress and failure Inadequate components or parts used Failure of one part caused by the failure of another part Use of parts that are of inferior quality/reliability

Failure mode: Manner or state in which an item or a component might fail Examples: Low output of an IC Separation of the IC packaging material Capacitor fails short due to crack propagation in the dielectric (failure mechanism) Resistor fails open, failure cause poor lead welding FET saturation and overheat Gain change Seal leakage
1-23-2002 M. Krasich 6

Identification of manufacturing problems


1-23-2002 M. Krasich 3

2002 Annual RELIABILITY and MAINTAINABILITY Symposium

13

Cause of a Failure Mode


Failure or failure mode cause
One failure mode can have multiple causes

Event
Basic event
Basic event for which reliability information is available Reliability model:
Component failure mode, or a failure mode cause

Examples:
Causes of capacitor short: electrical overstress, high temperature, vehicle vibration, high soldering temperature Causes of a IC enclosure failure: moisture, high temperature, IC manufacturing process

Conditional event
Event that is a condition of occurrence of another event when both must occur for the output to occur Reliability model:
Occurrence of event that must occur for another event to occur

Causes of a component open


poor soldering, manufacturing breakage in insertion

Causes of a seal to leak in communication application (under water ocean bottom)


water pressure causing dilatation, cold temperature, wearout during mating and de-mating, material degradation, manufacturing defect (under-size)
1-23-2002 M. Krasich 7

1-23-2002

M. Krasich

10

Use of a Fault Tree


Fault Tree Analysis (FTA), is a Boolean representation of a system and its assemblies and functions, along with failure modes and their respective causes FTA is used for a multiple mission:
For modeling the Item/system architecture and functionality with a fault tree logic diagram top down to search for potential failure modes that might cause an unfavorable outcome defined as a failure of the system and their respective causes To quantitatively estimate the item reliability To identify those failure modes and causes that are the highest contributor to the item probability of failure To evaluate necessary and possible improvements trade off To asses the item reliability improvement as the potential failure modes are mitigated.

Events cont.
Dormant event
A basic event that represents a dormant failure Reliability model:
Dormant component failure mode or dormant failure cause

Undeveloped event
A part of a system not yet developed

1-23-2002

M. Krasich

1-23-2002

M. Krasich

11

Fault Tree - Introduction


Fault tree
A logic diagram representing functional dependencies of parts of a system, and arrangement of events causing unfavorable outcomes - system failure that correspond predetermined failure definition.

Gates
OR gate
This output event occurs if any of its input event occur Reliability model: Failure occurs if any of the parts of that system fails - series system

Fault tree components


Gates
Outcomes of one or a combination of input events

AND gate
The output event takes place if all of the input events occur Reliability model: Parallel redundancy, one out of n equal or different branches.

Cut sets
Groups of events that, if all occur, would cause a system failure. Minimal cut set: contains the minimum number of events that are required for failure. A removal of one of them would result in system not failing.

Majority vote gate:


This output occurs if m of the inputs occur Reliability model: Redundancy k out of n, where m = n - k+1

Events Basic events


Usually a failure cause. Gets an assigned value: failure rate, MTBF, or failure probability
1-23-2002 M. Krasich 9

Priority AND gate:


The output event (failure) occurs only if the input events occur in sequence from left to right Reliability model: secondary failures or for enabling events
1-23-2002 M. Krasich 12

2002 Annual RELIABILITY and MAINTAINABILITY Symposium

14

Modeling with a Fault Tree Boolean Algebra


Gates cont.
Exclusive OR gate
The output event takes place if one, but not the other input occurs Reliability model: A failure of the system occurring only if one, not both of the two possible failures happens

Basis for the Fault Tree: Boolean algebra, used to produce minimal cut sets (or paths sets) 1 4
Cut Sets A System fails if any one of the cut set happens: 2 c1 = 1,2 c2 = 4,5 c3 = 1,3,5 c4 = 2,3,4 RS = 1 - FS FS = Pr(c1 c2 c3 c4)
Pr(c 1 ) = F1 F2 = (1 R1 ) (1 R 2 ) Correct calculation (Esary Proschan) : Pr(c 1 c 2 c 3 c 4 ) = 1 [1 Pr(c 1 )] [1 Pr(c 2 )] [1 Pr(c 3 )] [1 Pr(c 4 )] Rare event approximat ion : FS = Pr(c 1 ) + Pr(c 2 ) + Pr(c 3 ) + Pr(c 4 ) FS = F1 F2 + F4 F5 + F1 F3 F5 + F2 F3 F4
1-23-2002 M. Krasich 15
3

B
5

Inhibit gate:
The output occurs only if both (or all) of the input events take place, one of them conditional Reliability model: Conditional probability of the final event

Transfer gate:
Gate indicating that this part of the system is developed in another part or page of the diagram Reliability reference: A partial reliability block diagram that is shown in other location of the overall system block diagram
1-23-2002 M. Krasich 13

Comparison of the FTA Calculation Methods Esary - Proschan (correct calculations) :


System Analysis Methods
A complex System Reliability Block Diagram (RBD) Example: Failure: No signal flow from A to B
1 4

Fs = 1 (1 F1 F2 ) (1 F4 F5 ) (1 F1 F3 F5 ) (1 F2 F3 F4 )

Rare Approximation :
Fsr = F1 F2 + F4 F5 + F1 F3 F5 + F2 F3 F4
F1 2 10
2

A
2

B
5

R S = (R1 + R 2 R1 R 2 ) (R 4 + R 5 R 4 R 5 ) R 3 +

F2

5 10

F3

8 10

F4

2.5 10

[R1 R 4 + R 2 R 5 R1 R 2 R 4 R 5 ] (1 R 3 )

Esary-Proschan : Fs Fs 1 1 F1 F2 10
3

Algebraic solution meaning:


Reliability of the system provided that R3 is good, plus reliability of the system provided R3 is bad.

F4 F5

F1 F 3 F 5

F 2 F3 F 4

9.068

When a system is really complex, with a multitude of interrelationships between the assemblies, the algebraic solutions become rapidly too involved. Environmental factors and manufacturing errors left out.
1-23-2002 M. Krasich 14

Rare Approximation: Fsr F1 F 2 F4 F5 F 1 F3 F5 F2 F 3 F 4


16

1-23-2002

M. Krasich

FTA Model with Esary-Proschan Calculation


1 4

A
2

B
No signal at the output

I E

Failure Q=9.068e-3

Signal not going thourgh the top first


IE

Signal not passing through the top branch


IE

Signal not passign through the bottom branch


I E

Signal not passing through the bottom bl ock fir st


I E

Cross 1 Q=4.800e-4

Top Q=1.000e-3

Bottom Q=7.500e-3

Cross 2 Q=1.000e-4

Block 1 fails

Block 3 fails

Block 5 fails

Block 1 fails

Block 2 failure

Block 4 fails

Block 5 fails

Block 2 failure

Block 3 fails

Block 4 fails

I E

IE

I E

IE

I E

I E

I E

IE

I E

IE

1 Q=2.000e-2

3 Q=8.000e-2

5 Q=3.000e-1

1 Q=2.000e-2

2 Q=5.000e-2

4 Q=2.500e-2

5 Q=3.000e-1

2 Q=5.000e-2

3 Q=8.000e-2

4 Q=2.500e-2

1-23-2002

M. Krasich

17

2002 Annual RELIABILITY and MAINTAINABILITY Symposium

15

Example: The Redundant Gates are Different


2 out 3
fail s if Gate 1 OR Gate 2 fails
IE

F3 F1 F2 F4 F5 Gate 2
2 0.0005 0.0032 F1 1 F2 F4 F5 FGate2 F3 0.00045

FTA Representation of the RBD RARE Approximation


1 4

TOP1 Q=2.502 e-3

Gate 1 Top Gate


n F1 F4 3 m 0.002 F2 0.00053F5 1 1

A
2

B
5

No signal at the output


IE

Fails if event 1 OR the event 2 takes place


I E

Fai ls if any two of the event takes place


I E

Failure Q=9.080e-3

GAT E1 Q=2.499e-3

2 GAT E2 Q=3.374e-6

Signal not going thourgh the top first


IE

Signal not passing through the top branch


IE

Signal not passign through the bottom branch


IE

Signal not passing through the bottom b ock fir st l


IE
I E

F1
I E

F2
IE

F3
I E

F4
I E

F5

FGate1 FGate2 FTopGate FTopGate

Cross 1 Q=4.800e-4

Top Q=1.000e-3

Bottom Q=7.500e-3

Cross 2 Q=1.000e-4

EVENT1 Q=0.002

EVENT 2 Q=0.0005

EVENT3 Q=0.000 45

EVENT4 Q=0.00053

EVENT 5 Q=0.0032

F3 F4 1

F3 F5 1

Block 1 fails

Block 3 fails

Block 5 fails

Block 1 fails

Block 2 failure

Block 4 fails

Block 5 fails

Block 2 failure

Block 3 fails

Block 4 fails

FGate1 1
3

IE

IE

IE

IE

IE

IE

IE

IE

IE

IE

2.502 10

1 Q=2.000e-2

3 Q=8.000e-2

5 Q=3.000e-1

1 Q=2.000e-2

2 Q=5.000e-2

4 Q=2.500e-2

5 Q=3.000e-1

2 Q=5.000e-2

3 Q=8.000e-2

4 Q=2.500e-2

1-23-2002

M. Krasich

20

1-23-2002

M. Krasich

18

F4 0.00053F5 Priority Gate - Example 0.0032

Example: Combination of Series and Redundant Events


2 out 3

fails i f Gate 1 OR Gate 2 fails


IE

FGate1 FGate2

F1

F2 F4 F5 1 FGate2

F3 F4 1

F3 F5 1

F3
fail s if Gate 1 OR Gate 2 fail s
IE

T OP1 Q=4.374 e-6

FTopGate
Fai ls if any of the two events takes place
I E

FGate1
3

F1

F2

F3
Fails only i f EVENT1 occurs first
I E

Gate 1
TOP1 Q=2.530 e-3

F3 Gate 2 Top Gate


m 2 0.0005 F1 n i (n 1 10 i) 1 F3 F2 1 F3 F3 1
i ( n i)

FTopGate

2.502 10

GATE1 Q=1.000e-6

2 GATE2 Q=3.374e-6

n
Fails if event 1 OR the event 2 occur
IE

Gate 1, Conditional probability:


F5
I E

Fails if 2 of the three events take place


IE

F1

0.002 F2 1
m 1

0.0032

F1
IE IE

F2
IE

F3
I E

F4

GATE1 Q=2.499e-3

2 GATE2 Q=3.072e-5

FGate1 FGate2

Probability of occurrence of EVENT1 = F1 Probability of occurrence of event 2 if event 1 occurred = F2 FGate1=F(EVENT1)*F(EVENT2|E VENT1)

EVENT1

EVENT2

EVENT3

EVENT4

EVENT 5

F1

F2

F3

F3

F3

Q=0.002

Q=0.0005

Q=0.000 45

Q=0.00053

Q=0.0032

I E

I E

IE

IE

I E

FTopGate
EVENT5 Q=0.0032

1 2.53

FGate1
3

FGate2
1-23-2002 M. Krasich

EVENT1 Q=0.002

EVENT2 Q=0.0005

EVENT3 Q=0.0032

EVENT4 Q=0.0032

21

FTopGate

1-23-2002

M. Krasich

19

Example Partial Schematic of a Switching Amplifier

1-23-2002

M. Krasich

22

2002 Annual RELIABILITY and MAINTAINABILITY Symposium

16

Inhibit Gate - Example

Example of the Priority and AND Gate Switching Amp Before Improvement
Overheat of FET due to LGND <2V
IE

fai ls if Gate 1 OR Gate 2 fail s


I E

Gate 1, Conditional probability: Gate 2, Inhibit:


Outcome occurs only if all three (or any number) of events or gates take place. Example: Three EMI protection capacitors in parallel.
F5
I E

T OP1 Q=1.001e-6

Page 1 Q=2.0969e-3
Fails only if EVENT1 happens bef ore EVENT2
I E

FET4 OVERHEAT

Fail s i f al l of the events take place


I E

Ceramic capacitor shorts to ground


IE

LGND shorted to ground causing improper FET bias and overheat


IE

Electrolyte mixed with debris causing making a short


IE

FET saturates due to LGND <2V


IE

GATE1 Q=1.000e-6

GAT E2 Q=7.632e-10

Short C905 Q=3.0001e-3

LGROUND Q=5.9911e-3

FET 4 SATURATION

DANDREIC SHORT

F1
I E I E

F2
I E

F3
I E

F4

Q=4.6875e-13 Q=3.5000e-1 Capacitor shorts brings Lground to the ground


IE

Manufactirung defects cause a short


IE

Capacitor shorts due to part random f ailure


IE PRF_SHORT_C6905

Manufactirung defects cause a short


IE

Short caused by leaking of the nearby capacitor


IE

No filtering if all of the three fail open


FGate2 FGate2 F3 F4 F5 7.632 10
10
25

resence of debris on the board


IE

EVENT 1 Q=0.002

EVENT 2 Q=0.0005

EVENT 3 Q=0.00045

EVENT 4 Q=0.00053

EVENT 5 Q=0.0032

MFG_SHORT_C905

MFG_Short_ C906

Short C906 Q=3.0001e-3

Q=7.0000e-8 Q=3.0000e-3
Excessive solder causing a short between the pins or pads
IE SOLDER SHORT_C6905

Q=7.0000e-8

EL. CAP LEAK Q=3.1250e-6

DEB RI S Q= 1.5000e-7 Electrolyte leak due to high humidity


IE

Debris on the PCB causing a sho rt


IE

Excessive solder causing a short between the pins or pads


IE SOLDER SHORT_C6906

Debris on the Capacitor fails due to part PCB causing a random failure short
IE IE

Capacito leaking Electroly te Leak due to High electroly te due Temperature to ageing
IE IE

1-23-2002

M. Krasich

DEBRIS_C6905

DEBRIS_C6906

PRF_C6906 Q=3.0000e-3

AGEING Q=1.0000e-6

HI-TEMP Q=1.2500e-7

HI_HUMIDITY Q=2.0000e-6

Q=5.0000e-8

Q=2 .0000e-8

Q=5.0000e-8

Q=2.0000e-8

1-23-2002

M. Krasich

23

Other Important Information from an FTA Software


Failure Frequency (hazard rate of all gates) Number of expected failures during the preset lifetime Unavailability (or availability) of the system or any gate (function or assembly), provided the system is assumed repairable Gate summary in various forms Confidence intervals on provided information (failure probability or unavailability Sensitivity analysis the most critical component variation in probability of occurrence Results from failure distributions other than exponential (constant failure rate) Results calculated with multiple simulations (we normally set the number of simulations to 10,000)
1-23-2002 M. Krasich 26

After Capacitor Improvement (0.033 F replaced 0.1 F)


Overheat of FET due to LGND <2V
IE

Page 1 Q=1.0009e-4

FET4 OVERHEAT

Ceramic capacitor shorts to ground


IE

LGND shorted to ground causing improper FET bias and overheat


I E

Electrolyte mixed with debris causing making a short


IE

FET saturates due to LGND <2V


I E

Short C905 Q=1.4299e-4

LGROUND Q=2.8596e-4

FET 4 SATURATION

DANDREIC SHORT

Q=4.6875e-13 Q=3.5000e-1 Capacitor shorts brings Lground to the ground


I E

Manufactirung defects cause a short


I E

Capacitor shorts due to part random f ailure


I E PRF_SHORT_C6905

Manufactirung defects cause a short


I E

Short caused by leaking of the nearby capacitor


I E

resence of debris on the board


I E

MFG_SHORT _C905

MFG_Short_C906

Short C906 Q=1.4299e-4

Q=7.0000e-8 Q=1.4292e-4
Excessive solder causing a short between the pins or pads
IE SOLDER SHORT_C6905

Q=7.0000e-8

EL. CAP LEAK Q=3.1250e-6

DEB RI S Q= 1.5000e-7 Electrolyte l eak due to high humidity


I E

Debris on the PCB causing a sho rt


IE

Excessive solder causing a short between the pins or pads


I E SOLDER SHORT_C6906

Debris on the Capacitor fails due to part PCB causing a random failure short
IE IE

Capacito leaking Electrolyte Leak electroly te due due to High to ageing Temperature
IE I E

DEBRIS_C6905

DEBRIS_C6906

PRF_C6906 Q=1.4292e-4

AGEING Q=1.0000e-6

HI-TEMP Q=1.2500e-7

HI_HUMIDITY Q=2.0000e-6

Q=5.0000e-8

Q=2 .0000e-8

Q=5.0000e-8

Q=2.0000e-8

1-23-2002

M. Krasich

24

Building a Fault Tree


Define the system Define its major parts or functions, I. e.:
Power supply Video Audio channels

Determine what phenomenon precludes proper operability of those parts or functions, i. e.


Shorted line voltage or no VCC supplied No video One or more audio channels not operational

Determine the causes of those phenomena Determine the contributing factors to the causes, i. e.
High temperature High humidity Electrical overstress
1-23-2002 M. Krasich 27

2002 Annual RELIABILITY and MAINTAINABILITY Symposium

17

Rationale for Analysis of A to D Conversion Input Circuit


The entire circuit will not work if:

Example Input to CODEC of an Amplifier

No voltage supplied to the analog input (pin 8): Open R206, or R208 (if open slight non-audible distortion) or short C174 or C176 (if any of the caps open, no failure) No 5V analog supplied to pin 7: C 181 or C 183 fail short U20 fails in whichever mode (low, high, or no output)

There will be no output to the D to A conversion and the rest of the amp if failed open: R214, R215, R218, and R 219 (if shorted not too much harm) Not all failure modes need to be considered if not important to the failure definition realistic prediction
1-23-2002 M. Krasich 30

1-23-2002

M. Krasich

28

FTA Representation of CODEC Analysis


Failure: No analog output from CODEC, one of the reasons: no analog inputs into it 1 or 2
No analog output f rom CODEC av ailable
IE Analog Outputs 1 and 2

Page 1

Q=3.3535e-2

Rationale for Analysis of A to D Input Circuit


For the amplifier to be operational, all signals have to be processed by CODEC coded and decoded In CODEC, the analog signal is converted to digital, and then again into analog for the analog output The input signal 1+ into the left channel of IC U20 interrupted if:
Components fail open:
R200, R209, C171
Failure of A to D conv ersion f or channel 1 and 2
IE

One or more digital outputs from U20 not available


IE

D to A conversion for analog outputs 1 and 2


IE D to A for A_OUT_1&2+

Analog outputs not available


IE

A to D 1 and 2 Q=1.7487e-2 Page 30

Digital f rom U20

Q=1.4648e-2 Page 29

Q=1.5972e-2

A_OUT_1 and 2

Q=5.5314e-4

No digital input provided for the U21

No data available from CAD_1


E
IE

U21 failure

5V Analog not delivered or noisy


IE

Analog output 1 not available


IE

Analog output 2 not available


IE

C179 shorts to ground (shorting the signal)

The input signal 2+ into the right channel of IC U20 interrupted if:
Components fail open:
R201, R205, C172

Go to page 30 for the analog inputs

IE

D input to U21 Q=1.4648e-2 Page 27

DAC_1_DAT Q=0.0000

Fail_U21 Q=3.2979e-4 Page 198

5 V ANA to U21

Q=1.0147e-3 Page 43

A_OUT_1 Q=2.7661e-4 Page 66

A_OUT_2 Q=2.7661e-4 Page 65

Page 5
1-23-2002 M. Krasich 31

C177 shorts to ground (shorting the signal) Opening of C117 might cause some noise, that will be filtered later in the circuit
1-23-2002 M. Krasich 29

FTA Representation of CODEC Analysis, cont.

Failure of A to D conv ersion f or channel 1 and 2


IE

Page 5

A to D 1 and 2 Q=1.7487e-2

Analog input 1 to CODEC not available


IE

Analog input 2 to CODEC not available


IE IE

U20 failure

Analog inputs 1 and/or 2 not available


IE

A_IN_1_+ Q=7.2087e-3 Page 71

A_IN_2_+ Q=8.1431e-3 Page 69

Fail_U20 Q=1.4721e-3 Page 200 5V Analog not delivered or noisy

Analog Inputs 1 & 2

Q=5.9081e-3

Input 1 to CODEC A to D not available or too noisy


IE

Input 2 to CODEC A to D not available


IE

One of the plus inputs (1 or 2) not provided to the converter; No 5V analog supply voltage provided IC U20 not operational
Page 30
1-23-2002

IE

5 V ANA Q=1.0147e-3

Input 1 into A to D

Input 2 into A to D

Q=3.1481e-3 Page 68

Q=4.2788e-3 Page 125

5V analog not available


IE IE

High or low frequency noise introduced to the signal

No 5V Analog Q=7.7798e-4 Page 44

Noise on 5V ANA

Q=2.3692e-4 Page 201

M. Krasich

32

2002 Annual RELIABILITY and MAINTAINABILITY Symposium

18

Failure Due to No Analog Voltage Supply

5V analog not available


IE

Page 30

No 5V Analog Q=7.7798e-4

Input 1 Not Available


Capacitor fails short, shorting signal 1 to the ground
IE

Input 1 to CODEC A to D not available or too noisy


IE

Page 30

The 5V analog shorts to ground, no voltage for pin 7 of U20


I E

Capacitor fails short, shorting +5V analog to the ground


I E I E

Voltage not available

Input 1 into A to D

Q=3.1481e-3

Short_EL_C183 Q=1.1136e-4

Short_C181 Q=2.7416e-4

+5V_ANA Q=3.9264e-4 Page 67

2.3 V supply

Open components interrupting the signal or causign noise


IE

IE

Connection short due to the manufacturing defect


I E

Capacitor fails short due to the part random failure


I E

Electrolyte leak due to the capacitor random failure


IE

Connection short due to the manufacturing defect


I E

Capacitor fails short due to the part random failure


I E

Short_C179 Q=2.0730e-4

2.3V Q=2.6595e-3 Page 126

Open Comp Q=4.1504e-4 Page 140

MFG_Short_El_C183

PRF_Short_El_C183

PRF_Leak_El_C183

MFG_Short_C181

Q=7.0000e-8
Q=9.36354e-005 Q=1.76564e-005 Debris on the PCB causing dandreic growth and a short
I E

Q=7.0000e-8

PRF_Short_C181

Q=0.000274094
Debris on the PCB causing dandreic growth and a short
I E

Connection short due to the manufacturing defect


IE

Capacitor fails short due to the part random failure


IE

MFG_Short_C179

Q=7.0000e-8

PRF_Short_C179

PRF Failure of the part random Failure probabilities are assigned to the manufacturing process quality requirement

Excessive solder causing a short between the pins or pads


IE Solder_short_El_C183

Excessive solder causing a short between the pins or pads


I E

Debris_El_C183

Debris_C 181 Q=2e-008

Solder_short_C181

Q=0.000207226
Debris on the PCB causing dandreic growth and a short
IE

Q=2e-008

Q=5e-008

Q=5e-008

Excessive solder causing a short between the pins or pads


IE

1-23-2002

M. Krasich

35

Debris_C179 Q=2e-008

Solder_short_C179

Page 68
1-23-2002
Q=5e-008

M. Krasich

33

Signal Noisy or Interrupted Due to Open Components


Open capacitor causes high frequency noi se on the input
IE

Open components interrupting the signal or causign noise


IE

High or Low Frequency Noise into the CODEC


Page 68

High or low frequency noise introduced to the signal


I E

P age 30

Noise on 5V ANA

Q=2.3692e-4

Open Comp Q=4.1504e-4

Open capac itor causes low frequency noise


Open capacitor interrupts the signal
IE

O pen capacitor caus es high frequenc y noise


I E

Resistor fail s open, +2.3 V not available for the analog input
IE

Resistor fails open, signal interrupted


IE

Resistor fails open, signal interrupted


IE

I E

Open_El_C183 Q=6.1851e-5

Open_C 181 Q=1.750 8e -4

Open_C179 Q=1.3238e-4

Open_R206 Q=5.1649e-5

Open_R209 Q=5.1649e-5

Open_El_C171 Q=1.2778e-4

Open_R200 Q=5.1649e-5
Capacitor connections open due to the manufacturing def ec t

Capacitor fails open due to the part random f ailure


IE

Resistor connections open due to the manuf acturing def ect


IE

Resistor f ails open due to the part random failure


IE

Resistor fails open due to the part random failure


IE

Capacitor connections open due to the manuf acturing def ect


IE

Capacitor fails open due to the part random failure


IE

Resistor fails open due to the part random failure


IE

Capacitor fails open due to the part random failure


I E

Capac itor connections open due to the manuf acturing defect

Capacitor fails open due to the part random failure


I E

I E

I E

MFG_Open_El_C183

PRF_Open_C179

MFG_Open_R206

PRF_Open_R206

PRF_Open_R209

MFG_Open_El_C171

PRF_Open_El_C171

Q=1.3000e-8
Q=5.16358e-005 Q=5.16358e-005

Q=1.3000e-8 Q=0.000127767

PRF_Open_R200

Q=1.300 0e-8

PRF_Open_El_C183

MFG_Open_C181

Q=1.3000e-8

PRF_Open_C181

Q=6.18377e-005
Q=0.000132368
Capacitor connections open due to the manuf acturing def ect
IE

Q=0.000175069
Connection opens due to ins ufficient or inproper soldering
I E

Q=5.16358e-005

Connecti on opens due to insufficient or inproper soldering


IE

Part not inserted during assembly


IE

Resistor connections open due to the manuf acturing def ect

Connecti on opens due to insufficient or inproper soldering


IE

Part not inserted during assembly


IE

Resistor connections open due to the manuf acturing def ect

IE

IE

Connection opens due to insufficient or inproper soldering


I E

P art not inserted during assembly


I E

P art not inserte d during assembly


I E

MFG_Open_C179

Q=1.3000e-8

Cold solder_R206

Missing_R206 Q=1e-009

MFG_Open_R209

Q=1.3000e-8

Cold solder_El_C171

Missing_El_C171

MFG_Open_R200

Q=1.3000e-8

Cold solder_El_C183

Mis sing_El_C183

Cold solder_C181

Missing_C 181 Q=1 e-009

Q=1.2e-008
Connection opens due to insufficient or i nproper sol dering
IE

Q=1.2e-008
Connection opens due to insufficient or inproper soldering
IE

Q=1e-009
Connection opens due to insufficient or inproper soldering
IE

Part not inserted during assembly


IE

Part not inserted during assembly


IE

Part not inserted during assembly


IE

Q=1 .2e -008

Q=1e-009

Q=1.2e-008

1-23-2002

M. Krasich

36

Cold solder_C179

Missing_C179 Q=1e-009

Cold solder_R209

Missing_R209 Q=1e-009

Cold solder_R200

Missing_R200 Q=1e-009

Q=1.2e-008

1-23-2002

Q=1.2e-008

Page 140

M. Krasich

Q=1.2e-008

34

Contribution of Manufacturing Defects


Contribution to components failing open
Cold or insufficient solder:
Connection opens over time due to the solder fatigue or vibrations

Missing components
Amazingly large number of components are not inserted during assembly detected later when the function exercised

Components cracked during insertion Broken or bent pins or leads

Contribution to failing short


Debris (un-cleaned flux) left on the board that with dandreic growth causes a short Excessive solder Bent pins (mostly ICs and connectors) shorting with another pin
1-23-2002 M. Krasich 37

2002 Annual RELIABILITY and MAINTAINABILITY Symposium

19

Example of Failure Probability Calculations


Automotive amplifier Life expectancy: 15 years Average active time (ON) daily: 2.7 hours Assumptions: Car stereo ON when driving automotive or Ground Mobile (GM) environment Car stereo OFF while car parked stationary thermally uncontrolled environment (GF) dormancy applies
F(15years) = 1 exp( GM t

Values for the Basic Events


Electrical components
Information from manufacturers (life test data)
Need to be adjusted for the proper environment and stresses

Software databases Field use (last resort)

Mechanical components
Determine stresses - loads (mechanical, environmental) Construct stress/strength equation for multiple loads if required Calculate design (safety) margin and reliability (probability of failure) for the required life

Manufacturing defects
Factory data Field failure data
1-23-2002 M. Krasich 38

1-23-2002

t ON = 365 15 (24 2.7) GFD = GF d where d = dormancy factor 0.1


M. Krasich

Component probability of failure can be calculated as: ) exp( GFD t ) ON OFF t ON = 365 15 2.7

40

Probability of the Seal Wear


The wear or spiral fracture of the Parker Fluorocarbon seals is noticed when the squeeze was 0.017 per side failure definition for a 0.210 cross section Abrasion resistance of Fluorocarbon is determined (Parker Handbook) to be good with the properly determined seal compression (squeeze) Radius of the above seal is found from:

Part of the Failure Mode Probability Worksheet


pn desc 191470-332 CAP,0603,X7R,50V,3300PF ref C540 rem PRF_C540 PRF_Short_C540 PRF_ChValue_C540 PRF_Open_C540 PRF_C541 PRF_Short_C541 PRF_ChValue_C541 PRF_Open_C541 PRF_D803 PRF_Short_D803 PRF_Open_D803 PRF_ParamCh_D803 PRF_D306 PRF_Short_D306 PRF_Open_D306 PRF_ParamCh_D306 PRF_D206 PRF_Short_D206 PRF_Open_D206 PRF_ParamCh_D206 PRF_D707 PRF_Short_D707 PRF_Open_D707 PRF_ParamCh_D707 PRF_D702 PRF_Short_D702 PRF_Open_D702 PRF_D100 PRF_Short_D100 PRF_Open_D100 PRF_ParamCh_D100 PRF_D101 PRF_Short_D101 PRF_Open_D101 Failure mode ratio Failure rate Dormant FR R(Ta) fr F0 F1 0.0089 8.937E-09 8.937E-10 0.999922 7.8285E-05 7.8285E-06 0.75 6.7028E-09 6.7028E-10 0.999941 5.8714E-05 5.8714E-06 0.1 8.937E-10 8.937E-11 0.999992 7.8288E-06 7.8288E-07 0.15 1.3406E-09 1.3406E-10 0.999988 1.1743E-05 1.1743E-06 0.0114 1.1351E-08 1.1351E-09 0.999901 9.943E-05 9.943E-06 0.75 8.5133E-09 8.5133E-10 0.999925 7.4573E-05 7.4573E-06 0.1 1.1351E-09 1.1351E-10 0.99999 9.9434E-06 9.9434E-07 0.15 1.7027E-09 1.7027E-10 0.999985 1.4915E-05 1.4915E-06 0.01 9.95E-09 9.95E-10 0.999913 8.7158E-05 8.7158E-06 0.2 1.99E-09 1.99E-10 0.999983 1.7432E-05 1.7432E-06 0.45 8.955E-10 8.955E-11 0.999992 7.8445E-06 7.8445E-07 0.35 6.965E-10 6.965E-11 0.999994 6.1013E-06 6.1013E-07 0.003 3E-09 3E-10 0.999974 2.628E-05 2.628E-06 0.2 6E-10 6E-11 0.999995 5.256E-06 5.256E-07 0.45 2.7E-10 2.7E-11 0.999998 2.3652E-06 2.3652E-07 0.35 2.1E-10 2.1E-11 0.999998 1.8396E-06 1.8396E-07 0.0101 1.0146E-08 1.0146E-09 0.999911 8.8875E-05 8.8875E-06 0.51 5.1745E-09 5.1745E-10 0.999955 4.5327E-05 4.5327E-06 0.29 1.5006E-09 1.5006E-10 0.999987 1.3145E-05 1.3145E-06 0.2 1.0349E-09 1.0349E-10 0.999991 9.0656E-06 9.0656E-07 0.0101 1.0146E-08 1.0146E-09 0.999911 8.8875E-05 8.8875E-06 0.51 5.1745E-09 5.1745E-10 0.999955 4.5327E-05 4.5327E-06 0.29 1.5006E-09 1.5006E-10 0.999987 1.3145E-05 1.3145E-06 0.2 1.0349E-09 1.0349E-10 0.999991 9.0656E-06 9.0656E-07 0.0172 1.72E-08 1.72E-09 0.999849 0.00015066 1.5066E-05 0.92 1.5824E-08 1.5824E-09 0.999861 0.00013861 1.3861E-05 0.08 1.2659E-09 1.2659E-10 0.999989 1.1089E-05 1.1089E-06 0.0101 1.0146E-08 1.0146E-09 0.999911 8.8875E-05 8.8875E-06 0.51 5.1745E-09 5.1745E-10 0.999955 4.5327E-05 4.5327E-06 0.29 1.5006E-09 1.5006E-10 0.999987 1.3145E-05 1.3145E-06 0.2 1.0349E-09 1.0349E-10 0.999991 9.0656E-06 9.0656E-07 0.0101 1.0146E-08 1.0146E-09 0.999911 8.8875E-05 8.8875E-06 0.51 5.1745E-09 5.1745E-10 0.999955 4.5327E-05 4.5327E-06 0.29 1.5006E-09 1.5006E-10 0.999987 1.3145E-05 1.3145E-06

191470-473 CAP,0603,X7R,50V,.047UF

C541

254110

DIODE,SCHOTTKY,40V,3A,S D803

0.21
1

0.2585
2

135247-5232 DIODE,ZEN,5.6V,225MW,5% D306

Ratios of the one sided compression and the respective 0.017 0.004 radiuses are: r = ; r =
The probability of the actual seal failure in ten years of life is:
F(10 years) =
1-23-2002

147239

DIODE,DUAL,SOT-23,BAW56 D206

147239

DIODE,DUAL,SOT-23,BAW56 D707

r1 r2

147239

DIODE,SWITCHING,75V,200 D702

(0.3 r1 )

+ (0.1 r2 )
M. Krasich

= 1.464 10 6
41

147239

DIODE,SOT-23,BAV 99

D100

147239

DIODE,SOT-23,BAV 99

D101

1-23-2002

M. Krasich

39

FTA Top Level Audio/Video Console example


Start from the system top level Include only those failure modes that affect the system performance Represent system architecture functional, hardware, or mix When work completed, look for the highest contributor to unreliability System failure
or improper operation
IE

Postman System Console

Q=7.365e-2

Analog signal not available


IE

No or improper power delivered to the system


IE

Tuner failure

No video

No SPDIF botth zones


IE

Failure of these functions causes noticeable difference


IE

IE

IE

ANALOG SIGNAL

Q=1.162e-2 Page 2
1-23-2002

Power Supply Q=3.280e-3 Page 12

Tuner Q=4.423e-3

Video Q=3.221e-3 Page 8

SPDIF Q=4.946e-2 Page 13

Functions Q=3.464e-2 Page 11


42

M. Krasich

2002 Annual RELIABILITY and MAINTAINABILITY Symposium

20

Audio/Video Console Reliability Growth Monitoring


1 0.99

Console goal R(1 year) = 0.992

The Highest Contributor to Unreliability - Example


Console Reliability

0.98

Transistors and FETs from a more reliable vendor Planned Reliability Growth TSSOPs replaced by SOICs

Follow the highest hitter down to its subassemblies Look for the highest contributor to its reliability

0.97

0.96

Achieved Reliability Growth


0.95

0.94

0.93

0.92

Y5V caps replaced by X7R (116) Initialy calculated

0.91 0 50 100 150 200 250 300

Duration of the design period (days)

1-23-2002

M. Krasich

45

Page 13

1-23-2002

M. Krasich

43

Fault Tree Analysis for Reliability Growth - Summary


Define what constitutes a system failure Start with the unfavorable outcome that defines the system failure Construct the fault tree down, using logic to express reliability modeling techniques Follow the analysis: failure of what assembly, signal, or part will cause the particular failure. Develop down to the causes of the pertinent failure modes Determine probabilities of occurrence of individual causes. Identify the highest unreliability contributor or safety related failure modes and mitigate Improve reliability as necessary and possible Update the analysis, monitor reliability until the goal is met
1-23-2002 M. Krasich 46

Detailed Failure Modes and Causes


Cause 1: TSSOPS Cause 2 Caps with Y5V dielectric

1-23-2002

M. Krasich

44

The Benefit of FTA for the Design Reliability Growth


1.02 1 0.98 0.96 0.94 Syste m Console Subw oofe r

Reliabilty

0.92 0.9 0.88 0.86 0.84 0.82 0.8 0 50 100 150 200 250 300

If 100,000 systems produced in on e year, 9,250 less w ill be returned for repair w ithin warranty period as a result of reliability improvement
Design Time (Days)
1-23-2002 M. Krasich 47

2002 Annual RELIABILITY and MAINTAINABILITY Symposium

21

Das könnte Ihnen auch gefallen