Sie sind auf Seite 1von 28

First Quarter - 2006

The Journal of Alions

315.337.0900
888.722.8737
315.337.9932
315.337.9933

General Information
General Information
Facsimile
Technical Inquiries

src@alionscience.com

via E-mail

http://src.alionscience.com

Visit SRC on the Web

Calculating Failure Rates of Series/Parallel


Networks

RMSQ Headlines

Describe two erroneous approaches commonly performed when calculating FR of


Serial/Parallel reliability networks.
Provide an example of a correct approach.
Approximate the percent errors one can
expect when FR is calculated erroneously.

indicates equivalency
Figure 1. Series Network

However, calculating the reliability and/or FR of


parallel networks requires a little more work.
The Toolkit contains excellent information for
doing this. See Reliability Toolkit Table 6.2-2
for calculating reliability, and Table 6.2-3 for calculating FR for parallel networks.
For example, consider the network in Figure 2.

2/3

Figure 2. Parallel Network


From Table 6.2-2 we get R(t) = 2e-t - e-2t and
from Table 6.2-3 (equation 4) we get
FR =

2
=
1 1
3
+
1 2

6
SRC Consulting
Services
12

Independent
Reliability Maturity
Assessment
19
System and Part
Integrated Data
Resource (SPIDR TM)
Released April 2006
21
The iFR Method for
Early Prediction of
Annualized Failure
Rates in Fielded
Products
27
From the Editor

28
For the network in Figure 3, first collect (add) all
Future Events
lambdas in series as shown, and then from the
Reliability Toolkit tables get:
System Reliability Center
201 Mill Street
Nature of the Problem
4
2
=
R(t) = 2e -2t - e -4t and FR =
Rome, NY 13440-6916
System reliability is calculated as a combination
1 1
3
+
of series and parallel paths and can be expressed
1 2
The SRC is a Center of Excellence for Reliability, Maintainability, and Supportability that has
served the engineering and acquisition community for more than 37 years. The SRC is wholly owned and operated by Alion Science and Technology. All rights reserved.

The objectives of this article are to:

Objectives

A quick review of the software QuART Pro


Version 2.0 Release 1 Build 70 was performed.
It also seemed to deal with single branches very
thoroughly but not multiple branches in series.

A Brief Tutorial on
Impact, Spalling,
Wear, Brinelling,
Thermal Shock, and
Radiation Damage

Recently, the author attempted to calculate the failure rate (FR) of a series/parallel (active redundant,
without repair) reliability network using the
Reliability Toolkit: Commercial Practices Edition
published by the System Reliability Center as a
guide. The Toolkits approach for FR calculation
for a single branch seemed to be very thorough. So
the FR for each individual branch was calculated.
Since several branches were in series, the FRs the
branches were then added together. Closer examination revealed that this approach was an oversimplification and failed to account for all possible
combinations (ways) that individual components
could fail. A closer review of the Reliability
Toolkit revealed it treats FR calculations of single
branches with n components in parallel very thoroughly but lacks detail in describing a method for
handling multiple branches in series

as a failure rate. Calculating the FR of a series


network as shown in Figure 1 is a simple act of
just adding all of the FRs in the series string
together, and should need no further explanation.

Introduction

By: Vito Faraci, BAE Systems

The Journal of the System Reliability Center

Incorrect Approach A (Network in Figure 4)


4/3

Figure 3. Network of Series Elements in Parallel


The tables provide correct solutions for the networks in Figures 1,
2, and 3. However, a potential problem occurs when calculating
the FR of a series/parallel network as shown in Figure 4. Analysts
commit a very common error by intuitively calculating the FR of
each parallel branch first, then add each branch FR together, since
the branches are in series, and erroneously calculate FR = 4/3 as
in this example. This FR calculation actually correlates to that for
the network in Figure 3. It is very important to understand that the
network in Figure 3 and the network in Figure 4 are not equivalent.

From the Reliability Toolkit Table 6.2-3, the FR of the each


branch is 2/3. It is intuitive to add these failures rates since the
two branches are in series. This erroneous approach yields 2/3
+ 2/3 = 4/3 which is obviously not equal to 12/11. This
approach will yield an approximate 22% error.

Incorrect Approach B (Network in Figure 4)


Another erroneous approach is to try to calculate FR as a function of time. For example, given that t = 10 hours, and = 250
fpmh (failures per million hours), one may be tempted to calculate network FR as follows:
R(10) = 4e -2t - 4e -3t + e -4t = 4e -2*250*10/10
6

2 / 3

2 / 3

4 / 3

symbol of
equivalence

not equivalent
(common error)

- 4e -3*250*10/10 + e -4*250*10/10 = 0.99998753

Then using FR = -ln(R(t))/t:


FR = -ln(0.99998753)/10 = 1.246 x 10-6 = 1.246 fpmh

Figure 4. Series-Parallel Network


This does not equal 12/11 = 12*250/11 = 273 fpmh.

Root Cause of Problem


The correct approach to calculate the FR of the network in Figure 4
(or any other network for that matter) is to calculate the reliability
of each branch first, then multiply together the reliability of each
branch. Assume the reliability of Branch 1 is R1(t), the reliability
of Branch 2 is R2(t), etc. Then network reliability = R(t) = R1(t)
R2(t) Rn(t). It is important to note that the reliability of each
branch Ri(t) must be kept in terms of the failure rates of the components within the branch, and not in terms of the failure rate of the
branch itself. Therein lies the root cause of the problem. The network FR can then be computed using the following definition.
FR =

1
1
=
MTTF
R(t) dt
0

Correct Approach (Network in Figure 4)


From the Reliability Toolkit Table 6.2-2, the reliability of each
branch is:
2e -t - e -2t =>

R(t) = (2e -t - e -2t ) (2e -t - e -2t ) = 4e -2t - 4e -3t + e -4t =>

MTTF = R(t)dt = (4e -2t - 4e -3t + e -4t ) dt


MTTF =
2

12
11
1
4 4
+
=
True FR =
11
2 3 4 12

First Quarter - 2006

Note that 1.246 fpmh is only an apparent FR measured during


a period 10 hours, not to be confused with the FR as formally
defined previously.
Given t = 100 hours, then:
6

R(100) = 4e -2*250*100/10 - 4e -3*250*100/10 + e -4*250*100/10

= 0.99878117 =>
-6
FR = -ln(0.99878117)/100 = 12.196 x 10 = 12.196 fpmh.

Note that 12.196 fpmh is another apparent FR measured at 100


hours. Notice also that the value of the apparent FR will vary
with t.
Networks with all components having the same lambda are not
very common. An example of the correct approach on a more
practical (common) network is shown next.

A Correct Approach for Calculating Network


Failure Rate
Consider the network shown in Figure 5 with failure rates a, b,
and c. The definition of success for the network, is defined as at
least 1 of 2 components of the left branch, and at least 2 of 3 components of the middle branch must be functional. From the
Reliability Toolkit Table 6.2-2, the reliability of the left branch is
2e-at - e-2at, and the middle branch is 3e-2bt 2e-3bt. By definition, the

The Journal of the System Reliability Center

reliability of the right branch is e-ct. Network reliability R(t) is calculated by multiplying the three branch reliabilities together.
1 of 2

2 of 3

Five sample networks were chosen starting with a 2 row by 2


column network as shown in Table 1. For the sake of simplicity, all network components were assigned the same lambda. In
each case, the true FR was compared with the FR calculated
erroneously by simply adding FRs of each branch. The % error
was then measured. From the table, it can be easily seen, that the
larger the network, the larger the error.

1 of 1

b
a
b

Error Magnitude Estimation for Erroneous


Approach A

a
b

Figure 5. Example with Multiple Paths

Error Magnitude Estimation for Erroneous


Approach B

Therefore:
R(t) = (2e -at - e -2at )(3e 2 bt - 2e -3bt ) e -ct
= 6e -(a + 2b +c)t - 4e -(a +3b + c)t - 3e -(2a + 2b + c)t + 2e -(2a +3b + c)t

The error magnitude for this approach will depend on the chosen
value of t, and would be very difficult to express as an equation.
Suffice to say that the FR calculated by this approach may not
come close, or even resemble the correct result.

and MTTF = R(t)dt =

Conclusions

-(a + 2b + c)t - 4e -(a +3b + c)t - 3e -(2a + 2b + c)t + 2e -(2a + 3b + c)t )dt
(6e
0

The rest is algebra. Calculate MTTF using known values of a, b,


and c, then take the reciprocal since FR = 1/MTTF.

In general, the larger the network, the larger the potential error
when oversimplified approaches are used in calculating the reliability of these complex networks. The percent error, although
not proven here, is a function of network size, network configuration, values of lambdas, and in some cases, a function of time.

2
3
4
6
+
MTTF =
a + 2b + c a + 3b + c 2a + 2b + c 2a + 3b + c

Erroneous Method A
A common error is performed when the analyst calculates the FR
of each individual branch first, then adds all calculated branch
FRs together. Note in the previous example, the FR of the left
branch is 2a/3, FR of the middle branch is 6b/5, and the FR of
the right branch is c.

Reliability Engineering, ARINC Research Company, Michael


Pecht, Editor, Prentice-Hall Inc, pages 202 to 226.

Vito Faraci is a mathematician by education and electrical engineer

= (10a + 18b + 15c)/15


Now simple algebra will show that (10a + 18b + 15c)/15 is not
equal to:

2
3
4
6
+
a + 2b + c a + 3b + c 2a + 2b + c 2a + 3b + c

Reference

About the Author

Therefore, FR (erroneous) = 2a/3 + 6b/5 + c

Calculating the failure rate (FR) of a series/parallel (active


redundant, without repair) reliability network is not as simple as
one might believe, an incorrect approach can lead to subtle but
substantial errors. Closer examination reveals that one must
carefully account for all possible paths of success for multiple
networks having branches in series.

by trade. He has worked as a Reliability and Maintainability


Engineer for an aerospace company for 18 years. Mr. Faracis
aerospace work experience is concentrated on System Failure
Analyses and Built-In-Test design.
Mr. Faraci has given seminars to the Federal Aviation
Administration on probability, reliability, Fault Tree Analysis
(FTA), FMEA, and Markov Analysis. He has given seminars at

Table 1. Error Magnitude Estimation Table for Erroneous Approach A


Network Configuration
(Rows x Columns)
2x2
2x3
2x4
3x2
3x3

True FR
12/11
10/7
280/163
60/73
2520/2467

Erroneous FR
(adding FRs of each Branch)
2/3 + 2/3 = 4/3
2/3 + 2/3 +2/3 = 6/3
2/3 + 2/3 +2/3 +2/3 = 8/3
6/11 + 6/11 = 12/11
6/11 + 6/11 + 6/11 + 6/11 = 18/11

% Error
22
40
55
32
60
First Quarter - 2006

The Journal of the System Reliability Center

various engineering symposiums on FTA vs. Markov Analysis


(combinatorial vs. non-combinatorial type problems) and written
several articles on FTA vs. Markov Analysis.

As a consultant, Mr. Faraci designed various pieces of test equipment for the Long Island Railroad. As a consultant, he wrote
software for a medical electronics firm.

RMSQ Headlines
Putting It All Together, UPTIME, NetExpressUSA, Inc., January
2006, page 4. This article discusses how Condition-Based
Maintenance (CBM) is more than simply conducting condition
monitoring activities and becoming proficient in the use of CBM
tools and technology. It provides some guidelines for creating a
CBM culture in production plants and other large facilities.
Recovering from Disaster, UPTIME, NetExpressUSA, Inc.,
January 2006, page 28. Hurricanes Katrina, Rita, and Wilma left
many plants along the Gulf Coast shut down and badly damaged
electric motors and generators. In this article, the author
describes the creative solutions maintenance professionals used
to remove moisture from thousands of motors and restore them
to operation.
Warming Up for Takeoff, Aerospace Engineering, SAE, Jan/Feb
2006, page 17. The article describes how Chromalox and NASA
worked to make the shuttle safer following the loss of Columbia
in January of 2003. The target of the effort was the design, qualification, and installation of heaters to replace foam previously
used to prevent the formation of ice.
FAA Actions Far from Inert on Fuel Tank Vapors, Aerospace
Engineering, SAE, Jan/Feb 2006, page 20. For more than seven
years, the FAA and private industry have been conducting
research into technologies for making fuel tanks inert, preventing flammable vapor fires. The article describes some of the
results of that research and how this safety improvement has
been determined to be economically as well as technically feasible.
Maintaining Reliability, Aerospace Engineering, SAE, Jan/Feb
2006, page 22. Regional airlines and operators of business jets
consistently list engine reliability as their top priority. To do this,
they take very specific maintenance actions intended to ensure
that their passengers can depend on safe flights with no engine
anomalies.
After Six Sigma, What Next?, Quality Progress, ASQ, January
2006, page 30. Six Sigma has evolved from Total Quality
Management and is widely used in a broad range of industry.
Some critics, however, contend that Six Sigma is merely old
wine in new bottles. This article discusses the next step in the
continuing evolution of Six Sigma in the never-ending quest to
improve an organizations competitive position, satisfy customers, and reduce costs.
The House that Fraud Built, Quality Progress, ASQ, January
2006, page 52. Quality Function Deployment (QFD) has long
4

First Quarter - 2006

been used to analyze customer needs and develop product


requirements. This article describes a rather unconventional use
of QFD. Specifically, QFD was used to identify and prioritize
warning signs that an organization may be guilty of financial
statement fraud.
An Index to Measure and Monitor a System of Systems
Performance Risk, Defense Acquisition Review, Defense
Acquisition University, December 2005-March 2006, page 405.
This article presents a method for combining individual system
Technical Performance Measures (TPMs) into an overall measure, and extends the approach to a system of systems.
Using Design of Experiments as a Process Road Map, Quality
Digest, QCI International, February 2006, page 29. In this article, the author explains that factorial designs and/or orthogonal
arrays may not be the most effective way to apply Design of
Experiments.
The V-22 Program, Defense AT&L, Defense Acquisition
University, March-April 2006, page 18. The author discusses
how the V-22 Obsolescence Team proactively manages and mitigates obsolescence problems in the V-22 weapon systems.
Project Management and the Law of Unintended
Consequences, Defense AT&L, Defense Acquisition University,
March-April 2006, page 29. The article discusses how a strong
risk management program can deal with the Law of Unintended
Consequences. Although not named, the law was described by
Adam Smith in 1776 in The Wealth of Nations. Smith wrote that
an individual was led by an invisible hand to promote and end
which was no part of his intentions. Program managers today
constantly face the possibility of unintended, and often undesirable, consequences. Risk management provides the means for
dealing with these consequences.
Link Satisfaction to Market Share and Profitability, Quality
Progress, ASQ, February 2006, page 50. Increased market share
and profitability are two objectives common to every company
no matter the product or service. Capturing and keeping customers requires a focused, continuing effort to provide good
products at a fair price, while still ensuring a reasonable profit.
This article discusses how customer satisfaction can lead to profitability and increased market share. The article discusses several methods of linking satisfaction data to financial performance
data. Choosing the best method depends on the amount and
type of data available.

The Journal of the System Reliability Center

A Brief Tutorial on Impact, Spalling, Wear, Brinelling, Thermal


Shock, and Radiation Damage
Editors Note: This is the second of a two-part article on failure modes and mechanisms in materials. It is based on a three part
series of articles by Benjamin D. Craig that appeared in issues of the AMPTIAC Quarterly.
cants and surface treatments (Reference 3). Selecting a material
Introduction
that is resistant to wear, such as one having high hardness (e.g.,
The purpose of this article is to briefly introduce several material
ceramics), is also a good method to prevent excessive wear.
failure modes. A better understanding of these failure mechanisms
Alternatively, hard coatings such as tungsten-carbide-cobalt can be
will enable more appropriate decisions when selecting materials
used to augment the hardness of a component having a relatively
for a particular application. Even a basic knowledge and awaresoft surface. Surface or heat treatments can also be used to increase
ness can help design engineers to be better equipped in delaying or
the hardness or smoothness of the surface. Examples include carpreventing the failure of a material or component. Failure can
burizing and superfinishing, which is described in Reference 4.
occur in systems with moving or non-moving parts. In systems
with moving parts, friction often leads to material degradation such
as wear, and collisions between two components can result in surface or more extensive material damage. Systems with non-moving parts are also prone to material failure, especially when certain
types of materials are subjected to extreme temperature changes or
to high energy radiation environments. Material failure often manifests itself in the form of cracking but can also appear as material
disintegration, mechanical property degradation, or even physical
deformation. For instance, impact failure can occur by fracture,
deformation, or material disintegration, while radiation damage
can cause a severe degradation of a materials properties. These
failure modes, and spalling, wear, brinelling, and thermal shock are
described throughout this article.

Wear
Wear is a general term describing the deterioration of a materials surface caused by frictional forces generated by contact
between two surfaces moving in relation to one another.
Temperature has an effect on the wear rate (rate at which a material deteriorates under frictional forces) because friction generates heat, which in turn can affect the microstructure of the material making it more susceptible to deterioration.
Components such as bearings, cams, and gears are often susceptible to wear. There are several different types of wear, including adhesive wear, abrasive wear, corrosive wear, surface fatigue
wear, impact wear, and fretting wear. Most of these will be discussed in some detail in the following sections.
Minimizing or protecting a materials surface from wear can be
accomplished through several methods including the use of lubriBonded Junction

Adhesive Wear
Adhesive wear occurs between two surfaces in relative motion as
the result of high contact stresses, which are generated because of
the inherent roughness of material surfaces. No matter how finely
polished a surface is, two materials in contact with each other do
not mate completely. This allows localized areas on the surface to
sustain a greater percentage of a mechanical load, while the areas
that are not in contact with the opposing surface absorb none of the
mechanical load. In adhesive wear, the peaks on the adjacent surfaces that do come into contact will plastically deform under pressure and form atomic bonds at the interface (in some cases this is
considered solid-phase welding). As the relative motion between
the surfaces continues, the shear stress at the now atomically bonded contact point increases until the shear strength limit of one of the
materials is reached and the contact point is broken bringing with it
a piece of the opposing surface. The broken material can then
either be released as debris or remain bonded to the other materials
surface. This process is demonstrated in Figure 1. Adhesive wear
is also known as scoring, scuffing, galling or seizing (galling and
seizure are described briefly below) (References 3 and 5).
High hardness and low strength are desirable properties for
applications requiring resistance to adhesive wear. However,
these properties are somewhat mutually exclusive, which makes
composite materials desirable for such applications. Examples
of resistant monolithic materials include low strength, high ductility polymers, and high hardness, low density ceramics.
Sintered copper infiltrated with polytetrafluoroethylene
(Teflon) and lead particle reinforced bronze materials are specific examples of composite materials that are highly resistant to
adhesive wear (Reference 3).
Horizontal arrows indicate directions of sliding

Sheared
Asperity

Bonded Asperitie s

Figure 1. Illustration of Adhesive Wear Mechanism (Reference 3)

(Continued on page 10)

First Quarter - 2006

The Journal of the System Reliability Center

SRC CONSULTING SERVICES


Is your organization faced with the following?
Specifying the reliability of an item you are contracting to an outside vendor.
Developing a reliability program plan that focuses on cost-effective reliability tasks.
Leveraging the lessons learned from historical data on your current system.
Balancing deadlines associated with reliability, maintainability, and supportability (RMS) analysis tasks using an
overburdened staff.
Wanting to develop best-in-class RMS practices without fully understanding the steps to reach this goal.
Performing maintenance practices with fewer resources while ensuring that requirements are not compromised.
Quantifying the reliability of your product through the development of effective accelerated life tests.
Identifying the root cause of repeated warranty claims against your product.
Whether you are looking for subject matter expertise that your organization does not inherently contain or your staff is
already overburdened SRC consulting services continue to address your toughest RMS challenges. Beyond the publications, databases, software, and training that you have relied on since 1968, the trained and experienced reliability professionals at SRC provide expert support through:
Reliability Goal/Requirement Development and Analysis: The Alion SRC reliability databases and tools are
trusted sources for providing a baseline to set realistic and achievable reliability goals and requirements for any
product. The SRC team also assesses a customers progress in achieving reliability goals and requirements and,
when not being achieved, identifies ways to mitigate problems.
Reliability, Maintainability, and Supportability Program Planning: The Alion SRC professionals develop,
implement, support, and execute RMS program plans tailored to the specific product(s) or environment(s) of our
customers. Program plan tasks that SRC recommends include: benchmarking, life cycle planning, market survey,
parts management, supplier control, and test strategy development.
Integrated Data Management Systems: SRC works with customers to develop integrated data management systems (web/PC-based) that transform their historical database into a tool that supports decisions throughout a systems life cycle. SRC develops integrated data management systems from maintenance management/data collection
systems to provide tailored results that monitor the field performance of systems and their components.
Reliability, Maintainability, and Supportability Analysis Task Facilitation: Alion SRC supports all facets of
RMS analysis including: reliability modeling, reliability/maintainability assessment, failure modes and effects

First Quarter - 2006

The Journal of the System Reliability Center

analysis (FMEA), fault tree analysis (FTA), thermal analysis, reliability centered maintenance (RCM) analysis,
testability analysis, human factors analysis, spares analysis, life cycle cost analysis, and maintenance task analysis.
SRC develops on-site RMS training programs to facilitate analysis tasks in a hands-on, team-based environment.
SRC engineers also develop industry standards for completing RMS analysis tasks (e.g., PRISM).
Reliability Maturity Assessment: SRC has developed a systematic approach for independently assessing the maturity of an organizations process. In addition to providing a numerical rating of an organizations current reliability
maturity, SRC provides an improvement roadmap for moving the organization forward to higher levels of maturity.
SRCs Reliability Maturity Assessment typically produces results in less than 30 days.
Maintenance Optimization: The Alion SRC assesses maintenance and reliability data to determine the optimum
time for replacement of components before failure. The SRC team uses analytical techniques to determine the optimal mix of corrective and preventive maintenance activities needed to sustain the desired level of operational reliability of systems while ensuring their safe and economical operation and support.
Accelerated Reliability Test Strategies: The Alion SRC staff work with customers to define practical acceleration
methodologies to shorten reliability tests and develops stress models tailored to the systems/components that achieve
their reliability goals and requirements without exceeding resource constraints. Statistical analysis of test results
then provides definitive answers about the long-term reliability of systems/components.
Root Cause and Statistical Analysis: The Alion SRC team rapidly and effectively performs root cause failure
analysis on electrical, mechanical, and electromechanical components and utilizes several laboratories when formal
laboratory analysis is required. To provide a comprehensive failure analysis solution the variability of the process
are measured using statistical process control and when improvements are needed SRCs statisticians apply the
design of experiments (DOE) principles to effectively improve the parameter of interest.
The Alion SRC team is ready to help you improve the availability, readiness, and total cost of ownership for your product. To get started, contact us today.

System Reliability Center (SRC)


http://src.alionscience.com/consulting (web)
src_consulting@alionscience.com (E-mail)
1.888.722.8737 (phone)
315.337.9932 (fax)

Note: URLs and E-mail addresses in the Journal are hyperlinks. Click right on the hyperlink to visit a web site or send an E-mail.
First Quarter - 2006

The Journal of the System Reliability Center

A Brief Tutorial on Impact, ... (Continued from page 5)


Abrasive Wear
Gouging, grinding and scratching are examples of abrasive wear,
which occurs when a solid surface experiences the displacement
or removal of material as a result of a forceful interaction with
another surface or particle. Particles can become trapped in
between the two surfaces in contact, and the relative motion
between them results in abrasion (displacement and removal of
surface material) of the surface that has a lower hardness. This
process is demonstrated in Figure 2. Sources of particles can
include foreign contaminants (particles originating outside the
system), wear debris, or solid constituents suspended in a fluid.
Alternatively, abrasive wear can occur in the absence of loose
particles when the roughness of one surface causes abrasion
and/or removal of material from the other surface. This wear
mechanism differs from adhesive wear in that there is no atomic
bonding between the two surfaces. Abrasive erosion occurs when
a fluid carrying solid particles is traveling in a direction parallel
(as opposed to perpendicular which is impact wear) to the surface, and the particles gradually deteriorate the materials surface.
Force/Pressure

Travel
Abrasive
particle

used to protect a base metal or alloy from harsh environments that


would otherwise cause it to corrode. If such a coating were subjected to abrasive or adhesive wear causing a loss of coating from
the materials surface, for instance, the base metal or alloy could
be exposed and consequently corroded. Alternatively, a surface
that is corroded or oxidized may be mechanically weakened and
more likely to wear at an increased rate. Furthermore, corrosion
products including oxide particles that are dislodged from the
materials surface can subsequently act as abrasive particles.

Surface Fatigue Wear


Surface or contact fatigue occurs when two material surfaces that
are in contact with each other in a rolling or combined rolling
and sliding motion create an alternating force or stress oriented
in a direction normal to the surface. The contact stress initiates
the formation of cracks slightly beneath the surface, which then
grow back toward the surface causing pits to form, as particles of
the material are ejected or worn away. This form of fatigue is
common in applications where an object repeatedly rolls across
the surface of a material resulting in a high concentration of
stress at each point along the surface. For example, rolling-element bearings, gears, and railroad wheels commonly exhibit surface fatigue (References 3 and 6). Figure 3 illustrates an example of the surface fatigue mechanism.

Debris

Direction of
rotation

Asperity

Air
Solid

Debris

Figure 2. Illustration of Abrasive Wear Mechanism (Reference 3)

Galling and Seizure


Galling is an extreme form of adhesive wear that involves excessive friction between the two surfaces resulting in localized
solid-phase welding and subsequent spalling of the mated parts.
This process causes significant damage to the surface of one or
both materials. Seizure is even more extreme in that the two surfaces experience a sufficient amount of solid-phase welding such
that the two components can no longer move.
Material hardness is a critical factor in the abrasive wear rate of the
surface, as higher hardness results in a lower wear rate. Moreover,
if the hardness of the materials surface is higher than the hardness
of the abrading particles, then little wear is observed and the particles are likely to be broken into smaller pieces. Materials with
high hardness and toughness properties are well-suited to prevent
or minimize abrasive wear. Examples of materials that are inherently resistant to abrasive wear include high hardness or surface
hardened steels, cobalt alloys and ceramics. (Reference 3)

Corrosive Wear
When the effects of corrosion and wear are combined, a more
rapid degradation of the materials surface may occur. This
process is known as corrosive wear. Films or coatings are often
10

First Quarter - 2006

Crack origin below surface

Figure 3. Illustration of Surface Fatigue Mechanism


(Reference 3)

Impact Wear
Impact wear is discussed in the section addressing impact failure
modes.

Fretting Wear
Surfaces that are in intimate contact with each other and are subject
to a small amplitude relative motion that is cyclic in nature, such as
vibration, tend to incur wear. Fretting wear is normally accompanied by the corrosion or oxidization of the debris and worn surface.
Unlike normal wear mechanisms only a small amount of the debris
is lost from the system; instead the debris remains within the conjoined surfaces. The mated surfaces essentially exhibit adhesion
through mechanical bonding, and the oscillatory motion causes the
surface to fragment, thereby creating oxidized debris. If the debris
becomes embedded in the surface of the softer metal, the wear rate
may be reduced. If the debris remains free at the interface between
the two materials the wear rate may be increased. Fatigue cracks
also have a tendency to form in the region of wear, resulting in a
further degradation of the materials surface. Liquid or solid lubri-

The Journal of the System Reliability Center

cants (e.g., surface treatments, coatings, etc.), residual stresses


(e.g., through shot or laser peening), surface grooving (e.g., to
enable the release of debris), and/or appropriate material selection
for the material pair can help to reduce the effects or prevent the
occurrence of fretting wear (Reference 7).

often comes as a result of the different types of radiation present in


space. Radiation is not limited to the space environment, however, as there are a number of environments and specific applications
that subject materials to this damaging energy (Figure 5).

Brinelling
Brinelling can be very basically defined as denting. When a
localized area of a materials surface is repeatedly impacted or is
subjected to a static load that overcomes the materials yield
strength causing it to permanently deform, it is considered to
have undergone brinelling. Bearings are often susceptible to
failure by brinelling since an indentation can cause an increase in
vibration, noise, and heating (Reference 7). Brinelling failures
can be caused by improper handling, such as forcing a bearing
into a housing, by dropping the bearing, or by severe vibrations,
such as those produced during ultrasonic cleaning (Reference 8).
Selecting a material with a high hardness or taking extra care
during handling and cleaning can help prevent brinelling.

Thermal Shock
Thermal shock is a failure mechanism that occurs in materials that
exhibit a significant temperature gradient (indicating a sudden and
dramatic change in temperature has occurred). For instance, if the
temperature gradient is so large that the material experiences thermal stresses (or strains) great enough to overcome its strength, it
may lead to fracture, especially if the material is constrained. An
example of the consequence of thermal shock is shown in Figure
4. Awareness of a system or components operating conditions
when selecting materials is important in order to prevent thermal
shock failure from occurring. The designer should choose a material that has an appropriate thermal conductivity and heat capacity
for the intended environmental conditions. In addition, residual
stresses (from shot or laser peening, for example) can help accommodate thermal stresses that are generated during thermal shock,
thereby potentially protecting the material from fracture.

Figure 4. Brittle Fracture of a Ductile Weld Material that Is


Constrained Caused by High Stresses Induced from a Rapid
1000F Change in Temperature (Photo Courtesy of Sachs,
Salvaterra & Associates, Inc.)

Radiation Damage
The space environment is very unfriendly to most materials due to
an array of harsh conditions that can easily and rapidly degrade the
material and/or its properties. Degradation of an exposed material

Figure 5. CO2 Laser Used to Study the Energy Incident on the


Effects of Radiation on Materials (Reference 9)
High-energy radiation, such as neutrons in a nuclear reactor, can
damage almost any material including metals, ceramics, and polymers (Reference 3). Typically, when a material is subjected to highenergy radiation its properties are altered through structural mutation in order to absorb some of the energy that is incident on the
material. For instance, when a metal is exposed to neutron radiation from a nuclear reactor, atoms in the metal are displaced resulting in the creation of defects. These defects can diffuse and coalesce to create crack initiation sites or can simply leave the metal
brittle and susceptible to failure through another mechanism.
Another portion of metal is absorbed and converted to heat. Metals
are often better suited to withstand radiation energy than are ceramics. Typically, the ductility, thermal conductivity, and electrical
conductivity are negatively impacted when a metal is exposed to
radiation (Reference 3). Ceramics are affected by radiation to varying extents depending on the type of inherent bonding (i.e., covalent or ionic). Ionically-bonded ceramics experience decreases in
ductility, thermal conductivity, and optical properties, but the damage can be reversed with proper heat treatment (similar to metals).
Covalently-bonded ceramics experience similar damage; however
the damage is somewhat permanent (Reference 3).
Polymers are especially susceptible to radiation even at low
energy levels, such as UV radiation. Damage from radiation in
polymers usually manifests itself as cracking. For this reason,
polymers have been known for their cracking problems in outdoor applications, where they are constantly exposed to UV radiation. UV blockers, absorbers, and stabilizers are often added to
polymers used for outdoor applications to augment their ability
to withstand incident radiation energy.

Corrosion
Corrosion is the deterioration of a metal or alloy and its properties due to a chemical or electrochemical reaction with the surrounding environment. The most serious result of corrosion is a
system or component failure. Material failure can occur either
(Continued on page 13)

First Quarter - 2006

11

The Journal of the System Reliability Center

Independent Reliability Maturity Assessment


In todays competitive global marketplace, profit and return-on-investment depend on effective and efficient design and
manufacturing processes. Effective, efficient reliability design processes result in better products, lower production costs,
lower ownership costs, and fewer warranty and liability claims.
Product reliability is an important product characteristic because it:

Is often cited as the reason customers should prefer one product over another.
Can be an important part of a comprehensive risk management program.
Is related to product safety and, hence, company liability.
Directly affects warranty costs and customer satisfaction.

To make reliability a key product requirement, an organization should first determine where it stands in terms of its
processes for designing and manufacturing for reliability.
An effective way to do this is through a reliability maturity assessment (RMA).
Alion Science and Technologys System Reliability Center (SRC) has developed and implemented a systematic
approach for independently assessing the maturity of an organizations process for designing and manufacturing for
reliability.
An RMA evaluates the processes used to design and manufacture for reliability to identify shortcomings in those
processes and provides a road map to improvement.
SRC engineers use documented procedures to ensure that our RMA is systematic, objective, and thorough. Our procedures:
Identify the specific areas to be examined and how the results of the examination will be evaluated and documented.
Are based on objective evidence, not on hearsay or casual impressions.
SRCs Reliability Maturity Assessment provides the following benefits to our customers:
Objective identification of strengths and weaknesses
Benefit of lessons learned from a wide range of industries
A roadmap for improvement

For more information, please contact:


System Reliability Center
Alion Science & Technology
201 Mill Street
Rome, NY 13440
Tel: 888.722.8737, Fax: 315.337.9932
E-mail: <src@alionscience.com>
12

First Quarter - 2006

The Journal of the System Reliability Center

A Brief Tutorial on Fract Impact, ... (Continued from page 11)


by sufficient material property degradation such that the material is rendered unable to perform its intended function, or by fracture that originates from or propagated by corrosive effects.

Uniform/General Corrosion
Uniform corrosion is a generalized corrosive attack that occurs
over a large surface area of a material. The result is a thinning
of the material until failure occurs. Uniform corrosion can also
lead to changes in surface properties such as increased surface
roughness and friction, which may cause component failure
especially in the case of moving parts that require lubricity.
In most cases corrosion is inevitable. Therefore, mitigating the
effects of corrosion or reducing the corrosion rate is essential to
ensuring material longevity. Protecting against uniform corrosion can often be accomplished through selection of a material
that is best suited for the anticipated environment. The selection
of materials for uniform corrosion resistance should simply take
into consideration the susceptibility of the metal to the type of
environment that will be encountered.
Aside from selecting a uniform corrosion resistant material, protection schemes such as barrier coatings can be implemented. Organic
or metallic coatings should be used wherever feasible. When coatings are not used, surface treatments that artificially produce the
metal oxide layer prior to exposure to the environment will result in
a more uniform oxide layer with a controlled thickness. There are
also surface treatments where additional elements are incorporated
for corrosion resistance, such as chromium. Also, vapor phase
inhibitors may be used in such applications as boilers to combat
corrosive elements and adjust the pH level of the environment.

Galvanic Corrosion
Galvanic corrosion is a form of corrosive attack that occurs when
two dissimilar metals (e.g., stainless steel and magnesium) are electrically connected, either through physically touching each other or
through an electrically conducting medium, such as an electrolyte.
When this occurs, an electrochemical cell can be established, resulting in an increased rate of oxidation of the more anodic material
(lower electrical potential). The opposing metal, the cathode, will
consequently receive a boost in its resistance to corrosion. Galvanic
corrosion (shown in Figure 6) is usually observed to be greatest near
the surface where the two dissimilar metals are in contact.

There are a number of driving forces that influence the occurrence


of galvanic corrosion and the rate at which it occurs. Among these
influencing factors are the difference in the electrical potentials of
the coupled metals, the relative area of each metal, and the system
geometry, and the environment to which the system is exposed.
In most cases, galvanic corrosion can be easily avoided if proper attention is given to the selection of materials during design of
a system. It is often beneficial for performance and operational
reasons for a system to utilize more than one type of metal, but
this may introduce a potential galvanic corrosion problem.
Therefore, sufficient consideration should be given to material
selection with regard to the electrical potential differences of the
metals. Cathodic protection, electrical insulation, or coatings
can also help protect materials from galvanic corrosion.

Crevice Corrosion
Crevice corrosion occurs as a result of water or other liquids getting trapped in a localized stagnant areas creating an enclosed corrosive environment. This commonly occurs under fasteners, gaskets, washers, and in joints or other components with small gaps.
Crevice corrosion can also occur under debris built up on surfaces,
sometimes referred to as poultice corrosion. Poultice corrosion
can be quite severe due to an increasing acidity in the crevice area.
Table 1 provides a brief list of guidelines that can help minimize
galvanic corrosion.
Table 1. Guidelines for Minimizing Galvanic Corrosion
(Reference 11)
Use one material to fabricate electrically isolated systems or components where practical.
If mixed metal systems are used, select combinations of metals as
close together as possible in the galvanic series, or select metals
that are galvanically compatible.
Avoid the unfavorable area effect of a small anode and large cathode. Small parts or critical components such as fasteners should be
the more noble metal.
Apply coatings with caution. Keep the coatings in good repair, particularly the one on the anodic member.
Insulate dissimilar metals wherever practical [for example, by using
a gasket]. It is important to insulate completely if possible.
Add inhibitors, if possible, to decrease the aggressiveness of the
environment.
Avoid threaded joints for materials far apart in the series.
Design for the use of readily replaceable anodic parts or make them
thicker for longer life.
Install a third metal that is anodic to both metals in the galvanic
contact.

Crevice Corrosion

Figure 6. Galvanic Corrosion between a Stainless Steel Screw


and Aluminum (Reference 10)

Crevice corrosion occurs as a result of water or other liquids getting trapped in localized stagnant areas creating an enclosed corrosive environment. This commonly occurs under fasteners, gaskets,
washers and in joints or in other components with small gaps.
Crevice corrosion can also occur under debris built up on surfaces,
First Quarter - 2006

13

The Journal of the System Reliability Center

sometimes referred to as poultice corrosion. Poultice corrosion


can be quite severe, due to an increasing acidity in the crevice area.
Several factors including crevice gap, depth, and the surface
ratios of materials affect the severity or rate of crevice corrosion.
Tighter gaps, for example, have been known to increase the rate
of crevice corrosion of stainless steels in chloride environments.
The larger crevice depth and greater surface area of metals will
generally increase the rate of corrosion.
Materials typically susceptible to crevice corrosion include aluminum alloys and stainless steels. Titanium alloys normally
have good resistance to crevice corrosion. However, they may
become susceptible in elevated temperature and acidic environments containing chlorides. Copper alloys can also experience
crevice corrosion in seawater environments.
To protect against problems with crevice corrosion, systems
should be designed to minimize areas likely to trap moisture,
other liquids, or debris. For example, welded joints can be used
instead of fastened joints to eliminate a possible crevice. Where
crevices are unavoidable, metals with a greater resistance to
crevice corrosion in the intended environment should be selected. Avoid the use of hydrophilic materials (strong affinity for
water) in fastening systems and gaskets. Crevice areas should be
sealed to prevent the ingress of water. Also, a regular cleaning
schedule should be implemented to remove any debris build up.

Pitting Corrosion
Pitting corrosion, also simply known as pitting, is an extremely
localized form of corrosion that occurs when a corrosive medium attacks a metal at specific points causing small holes or pits
to form (see Figure 7). This usually happens when a protective
coating or oxide film is perforated, due to mechanical damage or
chemical degradation. Pitting can be one of the most dangerous
forms of corrosion because it is difficult to anticipate and prevent, relatively difficult to detect, occurs very rapidly, and penetrates a metal without causing it to lose a significant amount of
weight. Failure of a metal due to the effects of pitting corrosion
can occur very suddenly. Pitting can have side effects too, for
example, cracks may initiate at the edge of a pit due to an
increase in the local stress. In addition, pits can coalesce underneath the surface, which can weaken the material considerably.

Stainless steels tend to be the most susceptible to pitting corrosion


among metals and alloys. Polishing the surface of stainless steels
can increase the resistance to pitting corrosion compared to etching or grinding the surface. Alloying can have a significant impact
on the pitting resistance of stainless steels. Conventional steel has
a greater resistance to pitting corrosion than stainless steels, but is
still susceptible, especially when unprotected. Aluminum in an
environment containing chlorides and aluminum brass (Cu-20Zn2Al) in contaminated or polluted water are usually susceptible to
pitting. Titanium is strongly resistant to pitting corrosion.
Proper material selection is very effective in preventing the
occurrence of pitting corrosion. Another option for protecting
against pitting is to mitigate aggressive environments and environmental components (e.g., chloride ions, low pH, etc.).
Inhibitors may sometimes stop pitting corrosion completely.
Further efforts during design of the system can aid in preventing
pitting corrosion, for example, by eliminating stagnant solutions
or by the inclusion of cathodic protection. In some cases, protective coatings can provide an effective solution to the problem
of pitting corrosion. However, they can also accelerate the corrosion process at locations where the coating has been breached
and the base metal is left exposed to the corrosive environment.

Intergranular Corrosion
Intergranular corrosion attacks the interior of metals along grain
boundaries. It is associated with impurities which tend to deposit
at grain boundaries and/or a difference in crystallographic phase
precipitated at grain boundaries. Heating of some metals can
cause a sensitization or an increase in the level of inhomogeniety at grain boundaries. Therefore, some heat treatments and
weldments can result in a propensity for intergranular corrosion.
Susceptible materials may also become sensitized if used in
operation at a high enough temperature environment to cause
such changes in internal crystallographic structure.
Intergranular corrosion can occur in many alloys. The most predominant susceptibilities have been observed in stainless steels
and some aluminum and nickel-based alloys. Stainless steels,
especially ferritic stainless steels, have been found to become
sensitized, particularly after welding. Aluminum alloys also suffer intergranular attack as a result of precipitates at grain boundaries that are more active. Exfoliation corrosion (shown in
Figure 8 is considered a type of intergranular corrosion in materials that have been mechanically worked to produce elongated
grains in one direction. High nickel alloys can be susceptible by
precipitation of intermetallic phases at grain boundaries.
Methods to limit intergranular corrosion include:

Figure 7. Pitting Corrosion of Stainless Steel Tubing


14

First Quarter - 2006

Keep impurity levels to a minimum.


Proper selection of heat treatments to reduce precipitation
at grain boundaries.
Specifically for stainless steels, reduce the carbon content,
and add stabilizing elements (Ti, Nb, Ta) which preferentially form more stable carbides than chromium carbide.

The Journal of the System Reliability Center

Erosion Corrosion

Figure 8. Exfoliation of an Aluminum Alloy in a Marine


Environment

Selective Leaching/Dealloying
Dealloying, also called selective leaching, is a rare form of corrosion where one element is targeted and consequently extracted
from a metal alloy, leaving behind an altered structure. The most
common form of selective leaching is dezincification (shown in
Figure 9), where zinc is extracted from brass alloys or other alloys
containing significant zinc content. Left behind are structures that
have experienced little or no dimensional change, but whose parent material is weakened, porous and brittle. Dealloying is a dangerous form of corrosion because it reduces a strong, ductile metal
to one that is weak, brittle and subsequently susceptible to failure.
Since there is little change in the metals dimensions dealloying
may go undetected, and failure can occur suddenly. Moreover, the
porous structure is open to the penetration of liquids and gases
deep into the metal, which can result in further degradation.
Selective leaching often occurs in acidic environments.

Erosion corrosion is a form of attack resulting from the interaction


of an electrolytic solution in motion relative to a metal surface. It
has typically been thought of as involving small solid particles dispersed within a liquid stream. The fluid motion causes wear and
abrasion, increasing rates of corrosion over uniform (non-motion)
corrosion under the same conditions. Erosion corrosion is evident
in pipelines, cooling systems, valves, boiler systems, propellers,
impellers, as well as numerous other components. Specialized
types of erosion corrosion occur as a result of impingement and
cavitation. Impingement refers to a directional change of the solution whereby a greater force is exhibited on a surface such as the
outside curve of an elbow joint. Cavitation is the phenomenon of
collapsing vapor bubbles which can cause surface damage if they
repeatedly hit one particular location on a metal.
There are several factors that influence the resistance of a material
to erosion corrosion including hardness, surface smoothness, fluid
velocity, fluid density, angle of impact, and the general corrosion
resistance of the material to the environment are other properties
that factor in. Materials with higher hardness values typically
resist erosion corrosion better than those that have a lower value.
There are some design techniques that can be used to limit erosion corrosion as follow:

Avoid turbulent flow.


Add deflector plates where flow impinges on a wall.
Add plates to protect welded areas from the fluid stream.
Put piping of concentrate additions vertically into the
center of a vessel.

Hydrogen Damage
There are a number of different ways that hydrogen can damage
metallic materials, resulting from the combined factors of hydrogen and residual or tensile stresses. Hydrogen damage can result
in cracking, embrittlement, loss of ductility, blistering and flaking, and also microperforation.

Figure 9. Dezincification of Brass Containing a High Zinc


Content (Reference 12)
Reducing the aggressive nature of the atmosphere by removing
oxygen and avoiding stagnant solutions/debris buildup can prevent dezincification. Cathodic protection can also be used for
prevention. However, the best alternative, economically, may be
to use a more resistant material such as red brass, which only
contains 15% Zn. Adding tin to brass also provides an improvement in the resistance to dezincification. Additionally, inhibiting
elements, such as arsenic, antimony and phosphorous can be
added in small amounts to the metal to provide further improvement. Avoiding the use of a copper metal containing a significant amount of zinc altogether may be necessary in systems
exposed to severe dezincification environments.

Hydrogen induced cracking (HIC) refers to the cracking of a


ductile alloy when under constant stress and where hydrogen gas
is present. Hydrogen is absorbed into areas of high triaxial stress
producing the observed damage. A related phenomenon, hydrogen embrittlement is the brittle fracture of a ductile alloy during
plastic deformation in a hydrogen gas containing environment.
In both cases, a loss of tensile ductility occurs with metals
exposed to hydrogen which results in a significant decrease in
elongation and reduction in area. It is most often observed in
low strength alloys and has been witnessed in steels, stainless
steels, aluminum alloys, nickel alloys, and titanium alloys.
High pressure hydrogen will attack carbon and low-alloy steels at
high temperatures. The hydrogen will diffuse into the metal and
react with carbon resulting in the formation of methane. This in turn
results in decarburization of the alloy and possibly cracks formation.
(Continued on page 18)

First Quarter - 2006

15

20 years in the making.

Relex Reliability Studio 2006

Relex Reliability Studio 2006


Relex Software is proud to announce the highly
anticipated release of Relex Reliability Studio 2006.
From the company with a history of innovation and a
list of impressive firsts, Relex Reliability Studio does it
once again: redefines reliability engineering tools
with a package of unprecedented power and flexibility
(and a little pizzazz!).

See it. Experience it. View the trailer at

www.relexthemovie.com

Relex Reliab
Studio
Want to see the best in action? With inventive features you hadnt even

Fault Tree/Event Tree

imagined before, our all new Relex Reliability Studio 2006 includes floating,

FMEA/FMECA

dockable, and tabbed control windows, the quick configure Relex Bar, the

FRACAS Corrective Acti

indispensable Project Navigator, and a completely customizable desktop.

Human Factors Risk Ana


Life Cycle Cost
Maintainability Prediction

Want global collaboration? The Relex Enterprise Edition supports

Markov Analysis

both Oracle and SQL Server in a scalable, robust solution with enterprise

Optimization and Simulation

capabilities such as permission-based roles, customizable workflow,

Reliability Block Diagram

alert notifications, audit tracking, and the new Relex iArchitect module

Reliability Prediction

supporting a web browser interface.

Weibull Analysis

Ready to learn more? Check out the Relex trailer at www.relexthemovie.com


and sign up at www.relex.com/news/seminars.asp to attend a preview showing
Studio 2006 call 724.836.8800 today!

excellence in reliability

Relex is a registered trademark of Relex Software Corporation. Other brand and product names are trademarks or registered trademarks of their respective holders.

The Journal of the System Reliability Center

A Brief Tutorial on Impact, ...

(Continued from page 15)

Methods to deter hydrogen damage are to:

Limit hydrogen introduced into the metal during processing.


Limit hydrogen in the operating environment.
Structural designs to reduce stresses (below threshold for
subcritical crack growth in a given environment)
Use barrier coatings
Use low hydrogen welding rods

Biological Corrosion
Microbiological corrosion is the acceleration of corrosion due to
the growth or existence of microorganisms in contact with a
material. This form of corrosion can appear in any environment
capable of supporting the life of microorganisms and is usually a
localized effect on the metal. Microorganisms may accelerate or
impede corrosion which is attributed to the oxygen concentration
and pH level of the microenvironment. Two types of bacteria
known to increase corrosion rates are sulfate-reducing bacteria
and sulfate-oxidizing bacteria. Sulfate-reducing bacteria convert
sulfates to sulfides which in turn create the metal sulfide corrosion product. Sulfate-oxidizing bacteria convert sulfate ions to
produce sulfuric acid leading to a decrease in pH level. There are
also many other bacteria capable of producing reduction and oxidation type reactions that will affect metals.
Methods to combat microbiological corrosion include:

Inhibitors/coatings that deter growth of microorganisms.


Preventive maintenance to remove microorganisms.

Stress Corrosion Cracking


Stress corrosion cracking (SCC) is an environmentally induced
cracking phenomenon that sometimes occurs when a metal is
subjected to a tensile stress and a corrosive environment simultaneously. This is not to be confused with similar phenomena
such as hydrogen embrittlement, in which the metal is embrittled
by hydrogen, often resulting in the formation of cracks.
Moreover, SCC is not defined as the cause of cracking that
occurs when the surface of the metal is corroded resulting in the
creation of a nucleating point for a crack. Rather, it is a synergistic effort of a corrosive agent and a modest, static stress.
Another form of corrosion similar to SCC, although with a subtle difference, is corrosion fatigue. The key difference is that
SCC occurs with a static stress, while corrosion fatigue requires
a dynamic or cyclic stress.
SCC is a process that takes place within the material, where the
cracks propagate through the internal structure, usually leaving the
surface unharmed. Aside from an applied mechanical stress, a residual, thermal, or welding stress along with the appropriate corrosive
agent may also be sufficient to promote SCC. Pitting corrosion,
especially in notch-sensitive metals, has been found to be one cause
for the initiation of SCC. SCC is a dangerous form of corrosion
18

First Quarter - 2006

because it can be difficult to detect, and it can occur at stress levels


which fall within the range that the metal is designed to handle.
Stress corrosion cracking is dependent on the environment based
on a number of factors including temperature, solution, metallic
structure and composition, and stress (Reference 13). However,
certain types of alloys are more susceptible to SCC in particular
environments, while other alloys are more resistant to that same
environment. Increasing the temperature of a system often
works to accelerate the rate of SCC. The presence of chlorides
or oxygen in the environment can also significantly influence the
occurrence and rate of SCC. SCC is a concern in alloys that produce a surface film in certain environments, since the film may
protect the alloy from other forms of corrosion, but not SCC.
There are several methods that may be used to minimize the risk
of SCC. Some of these methods include:

Choose a material that is resistant to SCC.


Employ proper design features for the anticipated forms of
corrosion (e.g., avoid crevices or include drainage holes).
Minimize stresses including thermal stresses.
Environment modifications (pH, oxygen content).
Use surface treatments (shot peening, laser shock peening) which increase the surface resistance to SCC.
Any barrier coatings will deter SCC as long as it remains
intact.
Reduce exposure of end grains (i.e., end grains can act as
initiation sites for cracking because of preferential corrosion and/or a local stress concentration).

Corrosion Fatigue
Corrosion fatigue was discussed in the section addressing fatigue
failure modes.

Failure Prevention
In general, the most effective ways to prevent a material from failing is proper and accurate design, routine and appropriate maintenance, and frequent inspection for defects and abnormalities.
Each of these general methods will be described in further detail.
Proper design of a system should include a thorough materials
selection process in order to eliminate materials that could potentially be incompatible with the operating environment and to
select the material that is most appropriate for the operating and
peak conditions of the system. If a material is selected based
only on its ability to meet mechanical property requirements, for
instance, it may fail due to incompatibility with the operating
environment. Therefore, all performance requirements, operating conditions, and potential failure modes must be considered
when selecting an appropriate material for the system.
Routine maintenance will lessen the possibility of a material failure due to extreme operating environments. For example, a
material that is susceptible to corrosion in a marine environment

The Journal of the System Reliability Center

could be sustained longer if the salt is periodically washed off. It


is generally a good idea to develop a maintenance plan before the
system is in service.
Finally, routine inspections can sometimes help identify if a
material is at the beginning stages of failure. If inspections are
performed in a routine fashion then it is more likely to prevent it
from failing while the system is in-service.

Conclusion
A number of material failure modes were introduced in this article
including impact, spalling, wear, brinelling, thermal shock, and
radiation damage. These mechanisms can affect metals, polymers,
ceramics, and composites in various applications and in many different environments. Thus, it is important to take these failure
modes into consideration during the design phases of a component
or system in order to make appropriate materials selection decisions.
From a research standpoint, researchers must consider all material failure modes when developing and maturing a new material or
when evolving an old material. However, material failure can
often be the result of inadequate material selection by the design
engineer or their incomplete understanding of the consequences
for placing specific types of materials in certain environments.
Education and understanding of the nature of materials and how
they fail are essential to preventing it from occurring. Simple
fracture or breaking into two pieces is not all-inclusive in terms of
failure, because materials also fail by being stretched, dented or
worn away. If potential failure modes are understood, then critical systems can be designed with redundancy or with fail-safe features to prevent a catastrophic failure of the system. Furthermore,
if appropriate effort is given to understanding the environment and
operating loads, keeping in mind potential failure modes, then a
system can be designed to be better suited to resist failure.

Acknowledgement
The author would like to thank Neville Sachs and Sachs,
Salvaterra & Associates, Inc. for their contribution of photos
included in this article.

References
1. B.D. Craig, Material Failure Modes, Part I: A Brief Tutorial
on Fracture, Ductile Failure, Elastic Deformation, Creep, and
Fatigue, AMPTIAC Quarterly, Vol. 9, No. 1, AMPTIAC,
2005, pages 9-16, <http://amptiac.alionscience.com/pdf/
2005MaterialEASE29.pdf>.
2. NASA Spur Gear Fatigue Data, NASA Glenn Research Center,
<http://www.grc.nasa.gov/WWW/5900/5950/Fatigue-data.htm>.
3. J.P. Shaffer, A. Saxena, S.D. Antolovich, T.H. Sanders, Jr.,
and S.B. Warner, The Science and Design of Engineering
Materials, 2nd Edition, McGraw-Hill, 1999.
4. P. Niskanen, A. Manesh, and R. Morgan, Reducing Wear With
Superfinish Technology, AMPTIAC Quarterly, Vol. 7, No. 1,
AMPTIAC, 2003, pp.3-9, <http://amptiac.alionscience.
com/pdf/AMP Q7_1ART01.pdf>.
5. Wear Failures, Metals Handbook, 9th Edition, Vol. 11:
Failure Analysis and Prevention, ASM International, 1986,
pp. 145-162.
6. J.R. Davis (editor), ASM Materials Engineering Dictionary,
ASM International, 1992.
7. J.A. Collins and S.R. Daniewicz, Failure Modes:
Performance and Service Requirements for Metals, M. Kutz
(editor), Handbook of Materials Selection, John Wiley and
Sons, 2002, pp. 705-773.
8. Failures of Rolling-Element Bearings, Metals Handbook,
9th Edition, Vol. 11: Failure Analysis and Prevention, ASM
International, 1986, pp. 490-513.
9. Projects Archive, Air Force Research Laboratory,
<http://www.afrl.af. mil/projects.html>.
10. Corrosion Technology Testbed, NASA Kennedy Space
Center, <http://corrosion.ksc.nasa.gov/>.
11. E.B. Bieberich and R.O. Hardies, TRIDENT Corrosion Control
Handbook, David W. Taylor Naval Ship Research and
Development Center, Naval Sea Systems Command,
DTRC/SME-87-99, February 1988; DTIC Doc.: AD-B120 952.
12. Corrosion on Flood Control Gates, U.S. Army Corps of Engineers, <http://www.sam.usace.army.mil/en/cp/CORROSION_
EXTRA.ppt>.
13. M.G. Fontana, Corrosion Engineering, 3rd Edition, McGrawHill, 1986.

System and Part Integrated Data Resource (SPIDRTM)


Released April 2006
SPIDR is the NEW Alion SRC comprehensive database of reliability and test data for systems and components. SPIDR is a revolutionary improvement to existing reliability data resources. With
annual updates and twice the volume of data now is the time to
upgrade from the four world-renowned data sources: Nonelectronic
Part Reliability Data (NPRD-95), Electronic Part Reliability Data
(EPRD-97), Failure Mode and Mechanism Distributions (FMD97), and Electrostatic Discharge Susceptibility Data 1995 (VZAP).

SPIDR is a Comprehensive Searchable Database of:

Over 6,000 unique component types

System and Component Field Experience Data (>200K


records of data)
More than 1016 hours of field reliability data
33 Different Usage Environments
Environmental and Life Test Data (>82,000 systems and
components)
Electrostatic Discharge (ESD) Susceptibility Data
(>50,000 test results)
Failure Mode and Mechanism Distributions (>27,000
systems and components)
State-of-the-Art Components
(Continued on page 21)
First Quarter - 2006

19

Safety
Reliability
Availability
Maintainability

Integrated Software for:

isograph

The Professionals Choice


Whatever the size of your project, from introducing a new 50 component to developing
billions of dollars of high tech aircraft, you need to be assured that your investments incur
the minimum of risk. That is why the professionals choose Isographs market-leading
range of Safety and Reliability products.
Consider the Advantages:
A comprehensive portfolio of fully integrated
software tools
Industrial strength products capable
of performing even the largest and most
complex analysis swiftly and efficiently
Broad range of ever-expanding component
libraries backed by the commitment to add new
components on request
Full support for all products by engineers with an in-depth
practical knowledge of the safety and reliability environment
Scheduled and bespoke training courses

So whatever the scale of your requirements,


Isograph provides the solutions you need.
Contact us today for a free trial CD and
discover how Isograph can help you:

Call 949 798 6114


or
e-mail sales@isograph.com

Pho

to c
o

urtesy o

f Lo
ckh
eed
Mart
in

Fault/Event Tree Analysis


Prediction
FMECA/FMEA
Reliability Block Diagrams
Markov Analysis
FRACAS
Hazop
Availability Simulation
Reliability-Centered Maintenance
Life Cycle Costing
Network Availability
Weibull
Attack Tree/Threat Analysis

Fault Tree Analysis - Event Tree Analysis - Prediction - FMECA/FMEA - Reliability Block Diagrams - Availability Simulation

RCM - Life Cycle Costing - Markov Analysis - Hazop - Weibull - FRACAS - Attack Tree Analysis - Network Availability

Isograph Inc 4695 MacArthur Court, 11th Floor, Newport Beach CA 92660
Tel: +1 949 798 6114 Fax: +1 949 798 5531 E-mail: sales@isograph.com Web: www.isograph-software.com

The Journal of the System Reliability Center

SPIDR addresses numerous data deficiencies that existed in


prior data products. Some examples include:

Nonelectronic Part Reliability Data (NPRD-95) and


Electronic Part Reliability Data (EPRD)
Test data misclassified as field data. SPIDR addresses this issue and includes a separate database of test
data for components and systems.
Corrected and improved naming conventions associated with all data. In some instances similar parts had
different or incomplete names.
Reviewed part numbers associated with all data and
validated that the same name was used for components with like part number.
Failure Mode Data (FMD-97)
Failure modes associated with test data were separated out of SPIDR failure mode distributions. Two separate failure mode data categories exist within
SPIDR, for test and field data.
Where possible, SPIDR associates a usage environment with failure mode data.
Failure modes categories were updated. Failure mechanisms have been separated from failure mode information. (e.g., prior data products used a failure mechanism as the failure mode for a given component).
Electrostatic Discharge Susceptibility Data (VZAP-95)
VZAP-95 includes no data on components manufactured after 1995. SPIDR includes ESD susceptibility
data for components tested through the end of calendar year 2005.

Field failure rate data is presented in operational and


calendar hours.
SPIDR includes the addition of test data for components
and systems.
SPIDR includes all raw data. SPIDR develops data
summaries based on a user search. Previous data products provided predefined data results reducing flexibility
of searches and ease of product and data updates.
Separate failure mode data resources exist in SPIDR for
test and field data.
SPIDR provides additional details regarding component
usage. For example SPIDR includes the system application and the usage environment associated with field and
failure mode data.
ESD susceptibility data for parts tested and manufactured
after 1995 are included in SPIDR which more than doubles the amount of ESD susceptibility data contained by
VZAP-95.
Improved software interface. Multiple search capabilities
allow users to find reliability data more efficiently and
quickly.
SPIDR will be updated annually. Prior products were
updated sporadically.

The cost of SPIDR is $1,995 and data updates are provided annually to maintenance subscribers. The SPIDR software includes a
user manual and on-line help to assist the user in understanding
the software capabilities. Visit the SPIDR web site to order your
copy today and take advantage of the complimentary 1 year of
maintenance for all SPIDR purchases before June 30, 2006!
<http://src.alionscience.com/spidr/>.

Improvements that you will find in SPIDR:

More than double the amount of component and system field


data. SPIDR contains over 1016 hours of field reliability
data. (Prior products had a total of only 1012 hours of data).

If you have additional questions, feel free to contact the SPIDR


program manager at 315.339.7055 or by E-mail at
<ddylis@alionscience.com>.

The iFR Method for Early Prediction of Annualized Failure


Rate in Fielded Products
By: Bill Lycette, Agilent Technologies

Introduction
As with many manufacturers of complex electronic equipment,
Agilent Technologies uses a non-parametric AFR (Annualized
Failure Rate) metric for reporting product reliability. However,
the AFR metric can be very sluggish in responding to changes in
customer-experienced reliability. When investments to improve
reliability culminate in the implementation of an engineering
change, it can take as many as 9 to 12 months before the
improvement is observed in the AFR. Equally important, degradation in reliability may be quickly detected by customers but it
may take several months before a change is observed in the manufacturers internal AFR measures.
The instant Failure Rate (iFR) is a parametric-based measure
developed for the express purpose of providing much quicker

feedback of changes in a products reliability. This paper


explains the iFR Method through analysis of actual field failure
data and demonstrates how a balance is struck between selecting
iFR variables that yield the best possible combination of quick
reliability feedback, effective AFR predictive power, and narrow
confidence bounds.
Note: The term iFR as used in this paper should not be confused
with the reliability hazard rate (Reference 1), given by
h(t) =

f(t)
R(t)

where h(t) is the true instantaneous failure rate, f(t) is the frequency of failures function, and R(t) is the reliability function
(Continued on page 23)

First Quarter - 2006

21

A
c
c
u
r
a
c
y

The Worlds Most Advanced


Risk Management, Reliability, Availability,
Maintainability and Safety Technology

S
u
p
p
o
r
t

When you purchase Item products, you are not just buying software you are expanding your team. Item delivers accuracy, knowledgeable and
dependable support, total solutions and scalability. Let us join your team!
Solutions and Services Integrated Software
Education and Training Consulting

T
o
t
a
l
S
o
l
u
t
i
o
n

Item ToolKit:
Reliability Prediction
Mil-HDBK-217 (Electronics)
Bellcore / Telcordia (Electronics)
RDF 2000 (Electronics)
China 299b (Electronics)
NSWC (Mechanical)
Maintainability Analysis
SpareCost Analysis
Failure Mode Effect and
Criticality Analysis
Reliability Block Diagram
Markov Analysis
Fault Tree Analysis
Event Tree Analysis

Item QRAS:
Quantitative Risk Assessment System
Event Sequence Diagram (ESD)
Binary Decision Diagram (BDD)
Fault Tree Analysis
Now
Sensitivity Analysis
Uncertainty Analysis
Available!

Our #1 Mission:
Commitment to you and your success
USA East, USA West and UK Regional locations to serve you effectively and efficiently.

Item Software Inc.


US: Tel: 714-935-2900 Fax: 714-935-2911 itemusa@itemsoft.com

UK: Tel: +44 (0) 1489 885085 Fax: +44 (0) 1489 885065 sales@itemuk.com

www.itemsoft.com

S
c
a
l
a
b
i
l
i
t
y

The Journal of the System Reliability Center

The iFR Method for ...

(Continued from page 21)

representing the probability that the product will survive until


time t. In this paper, iFR is meant to signify a metric that
is much more responsive than the non-parametric AFR in
measuring the reliability of products currently being shipped.

Annualized Failure Rate Metrics


A widely-used method for measuring reliability of electronic
equipment is calculating field failure rates using the Annualized
Failure Rate (AFR). There are countless different variations on
such non-parametric methods as explained in Reference 2, but
they generally rely upon simple calculations involving the number of failures and the size of the installed base (Reference 3).
The advantages of such methods are their simplicity and ease of
understanding. No special software or graphing paper is required
to make the calculation. The computation is straightforward and
can be performed by someone unfamiliar with reliability statistics. The calculation is quick, simple and can be easily explained
to the layperson. For these reasons, AFR is widely used in industry to measure the reliability of electronic equipment.
The disadvantages of such AFR methods are several. These
methods do not allow for quantification of confidence bounds.
Additionally, many of such metrics make the potentially false
assumption that the underlying failure rate is constant over time.
These methods also do not allow for conditional probability calculations.
Finally, a major disadvantage of AFR methods is that they can be
sluggish to respond to changes in product reliability (both degradation and improvement) during the products manufacturing
life cycle. The iFR Method provides a solution to sluggish
response time.

The iFR Method


The method is based on parametric techniques involving reliability statistics and principles. Reliability statistics are well documented in textbooks and the literature (References 4, 5, and 6).
By using the iFR Method, changes in the reliability of complex
electronic measurement equipment can be detected by as many
as four to six months earlier than would be otherwise possible
using some conventional AFR methods.
Keys to Success
The keys to the success of the iFR Method include selecting a
shipment evaluation window that strikes the optimum balance
among the following.
1.
2.
3.
4.
5.

Providing timely feedback of reliability changes,


Detecting the occurrence of new failure mechanisms,
Providing acceptable confidence bounds,
Minimizing reliability false alarms, and
Providing useful predictive power for anticipating eventual changes in AFR.

Through analysis of shipment and failure data associated with


more than a dozen different products involving complex
microwave/RF measurement equipment, the iFR Method has
effectively predicted future AFR trends by using a shipment
evaluation window in the range of four to six months.
Complex electronic measurement equipment is characterized by
having between a few thousand electronic components and more
than 10,000 electronic components, while at the same time having
relatively few mechanical components. With the advancement in
design and manufacturing technology over the past 10-20 years,
electronic components have the distinguishing feature of typically
exhibiting constant failure rates or improving failure rates (socalled infant mortality) over time. Eventually these parts will
enter a wear-out phase which is marked by a rapidly increasing fail
rate; however for electronic components, this phase is generally
well beyond the normal expected operating life of the end product.
Therefore owners of complex electronic measurement equipment
will rarely experience such electronic component failures.
Mechanical parts are susceptible to wear-out failure mechanisms. However, their relatively low numbers in electronic
measurement equipment and recent advances in their reliability
have resulted in products where customer-experienced failure of
mechanical parts is fairly small over the expected operating life.
These observations, coupled with the selection of an optimum iFR
data analysis shipment window, mean that it is possible to predict
changes in traditional AFR reliability metrics by as many as six
months earlier than when the AFR metric would show the change.
Description of the iFR Method
1. The Shipment Evaluation Window is defined to be the
number of consecutive months containing product shipments that the reliability analyst wishes to consider in
the failure rate prediction.
2. The iFR Reporting Month is defined to be the last month
of the Shipment Evaluation Window.
3. Data processing and metric calculation begins one
month after the end of the Shipment Evaluation Window,
referred to as the Calculation Date (CD).
4. On the CD, qualifying shipment records are collected
from the specified Shipment Evaluation Window.
5. On the CD failure records (namely failure age) are collected from qualifying shipments records.
6. Ages of qualifying shipments that have not yet failed as
of the CD are calculated.
7. Ages of failed products and unfailed products are
entered into a parametric reliability data analysis tool.
8. The life data distribution that best fits the data is selected. For distributions that equally fit the data, selecting
the distribution that yields the lowest fail rate generally
provides the best predictive result.
9. The percent of failed products expected after one year of
operation is calculated. This is the iFR for that
Reporting Month.
First Quarter - 2006

23

The Journal of the System Reliability Center

10. The iFR by Reporting Month is plotted over time and the
trend line used to predict changes in the AFR.
Refer to Figure 1 for a timeline of iFR Method events.
Shipment Evaluation Window
of qualifying shipments
Calculation Date

Jan

Feb

Mar

Apr

May

Jun

Jul

iFR Reporting Month


Failures from
qualifying shipments

Figure 2. 12-Month Shipment Evaluation Window

Figure 1. Timing of 5-Month iFR Calculation Events

Data Analysis Results


Agilent Technologies designs and manufactures complex
microwave/RF test and measurement equipment. It is common
for such equipment to have 10,000 or more electronic components. Typically the equipment is constructed of several digital
and analog printed circuit assemblies, hybrid microcircuit assemblies, power supply, disk drive, CD-ROM, keypad assembly and
display. While the equipment does contain some mechanical
parts, the vast majority of components are electronic.
The paragraphs that follow illustrate the iFR Method using actual fielded data of a complex measurement system. The data presented here has been disguised but the results and conclusions
are the same.
Field failure data spanning more than three years was studied
using the iFR Method. Shipment windows consisting of six different sizes were initially analyzed: two months, four months,
six months, eight months, ten months, and twelve months.
Utilizing parametric methods, the iFR was calculated for 30 successive reporting months. These data points were plotted to
evaluate the relationship between the calculated iFR and the
products eventual AFR.

Figure 3. 2-Month Shipment Evaluation Window


In examining the other shipment evaluation windows as shown
in Figures 4 through 6, we see that an evaluation window
between four and six months seems to offer the best balance
between predictive power of the iFR, stability and shortest delay
in the making the iFR calculation. The iFR for a 10-month
shipment evaluation window was calculated but is not presented
here for brevity; its shape is very similar to that of the 8-month
shipment evaluation window.

Impact of Shipment Window Size


In looking at Figure 2, at the one extreme we see that using a
wide 12-month shipment evaluation window results in an iFR
that is an excellent predictor of the eventual AFR. However, the
disadvantage of the 12-month shipment window is that one must
wait a full year before making the calculation. As a result, such
a wide evaluation window makes this choice of little value.
At the other the extreme shown in Figure 3, we see that a narrow
2-month shipment evaluation window yields a very responsive
metric that can be calculated with very little delay.
Unfortunately, such a short evaluation window can result in
some wild month-over-month fluctuations, numerous false
alarms and a generally poor predictor of the eventual AFR (as
will be discussed later in this article).
24

First Quarter - 2006

Figure 4. 4-Month Shipment Evaluation Window


The graphs show a steep decrease in iFR at the end of 2002.
Earlier in 2002, the engineering team discovered that one of the
components in an attenuator assembly degraded over time. After

The Journal of the System Reliability Center

evaluating alternative components, an improved device was


selected and implemented in mid-2002. To the engineering
teams relief, the AFR subsequently dropped as predicted by the
iFR and the design change was affirmed.

Figure 7. 5-Month Shipment Evaluation Window


For a failure process that follows the exponential distribution,
the two-sided upper limit on the failure rate is:
Figure 5. 6-Month Shipment Evaluation Window

U =

/ 2,2r + 2

2T

where 2 is the Chi square distribution, is the significance


level, 2r+2 is the degrees of freedom (r is the number of failures
observed) and T is the total product exposure time.4
The two-sided lower limit on the failure rate is given by:
L =

Figure 6. 8-Month Shipment Evaluation Window


The initial increase in iFR throughout the first half of 2003 accurately predicted an associated increase in AFR during the second
half of 2003. A Pareto analysis of failed assemblies showed
increasing failure rates of several printed circuit assemblies (PCA).
Failure analysis of the PCAs revealed a fabrication problem with
tantalum capacitors purchased from the same supplier. Other suppliers components were evaluated and devices with improved reliability implemented later in 2003. The iFR gave advance notice of
the problem and confirmed that the solution would be effective.

12- / 2,2r
2T

Larger shipment evaluation windows will provide greater precision in the metric. Again, we have the tradeoff between using
small shipment windows (quick reliability feedback) and larger
shipment windows. The effect of evaluation window size on
confidence bounds can be seen in Figures 2 and 3.
The confidence bounds for shipment evaluation windows of four
months, five months and six months were also calculated (not
presented here for brevity). These confidence bounds were all
roughly the same and therefore did not play a significant factor
in selecting the optimum shipment evaluation window.

Confidence Bounds
Another important aspect when considering what size of evaluation period to select is the width of the iFR confidence bounds.
Confidence bounds on failure rates are inversely proportional to
the number of field failures observed.

iFR Predictive Power


We want a shipment evaluation period that affords the best possible predictive power (using iFR to predict the eventual AFR)
while at the same time minimizing calculation delay time and
confidence bound widths. A correlation analysis using linear
regression was performed where iFR and eventual AFR were
compared using the previously described shipment evaluation
windows. For each shipment evaluation window, correlation
coefficients were calculated using five different iFR lead times.
iFR lead time represents the amount of advance notice that the
iFR metric provides with respect to predicting the eventual AFR
number. Analysis results are shown in Table 1.

Consequently, narrowest confidence bounds will occur with the


largest shipment evaluation windows.

Similar to a long range weather forecast, iFR predictive accuracy declines as we attempt to predict further into the future about

To further optimize predictive results, the iFR Method was


refined by calculating failure rates using a 5-month shipment
evaluation window. The results are shown in Figure 7.

First Quarter - 2006

25

The Journal of the System Reliability Center

what the AFR will eventually be. We also see that the iFR predictive power improves with larger shipment evaluation windows. Larger evaluation windows tend to yield better results
because 1) greater customer-use time (i.e., exposure time) provides for more latent failure mechanisms to manifest themselves,
and 2) larger data sets drive smaller random variation (confidence bounds) in the calculated iFR.
Table 1. Correlation Coefficients to Assess the Predictive
Power of the iFR
Shipment Window
Size (in Months)
2
4
5
6
8
10
12

AFR Advance Warning Lead Time (in months)


0
1
2
3
4
5
6
7
8
- 0.48 0.57 0.59 0.47 0.34
- 0.53 0.67 0.68 0.65 0.53 - 0.55 0.74 0.79 0.70 0.60 - 0.39 0.60 0.71 0.68 0.53 0.45 0.58 0.70 0.64 0.49 0.71 0.69 0.72 0.62 0.71 0.67 0.61 -

It would make little sense to use a shipment evaluation window


of 10 or 12 months because one would have to wait for nearly
one year in order to make an accurate statement about what the
AFR is likely to do in the following month.
In this example, we see that the sweet spot for predicting AFR
(highest predictive power, shortest iFR calculation delay and
acceptable confidence bounds) is achieved by selecting a five
month shipment evaluation window. This results in an optimum
AFR lead time indicator of four months. Slightly inferior, but
nevertheless useful, results can be obtained with four and six
month shipment evaluation windows.
Limitations
Methods for predicting field failure rates are based on failure
mechanisms that have already manifested themselves. If a specific failure mechanism, e.g., the wear out of a disk drive bearing, has not already presented itself in the data, then such methods have no way of knowing that the failure mechanism can
occur. As such, predictive methods would not be effective on
products that have failure mechanisms occurring beyond the
edge of the shipment evaluation window.
While complex electronic measurement equipment is typically
constructed of electronic components that exhibit constant or
decreasing failure rates, there are occasions when such components may exhibit an increasing failure rate during the products
normal expected operating life. Likewise, one of the products
handful of mechanical parts may enter a wear out phase unexpectedly early. Either of these two scenarios would likely not
provide for an accurate AFR prediction based on only a few
months of early field data used in the iFR Method.

Conclusions
Reliability metrics such as the widely used Annualized Failure
Rate can be extremely sluggish to respond to changes in the prod26

First Quarter - 2006

ucts reliability over the course of its manufacturing life cycle.


The iFR Method described in this paper has been shown to be
effective in providing a more responsive leading indicator of customer-experienced reliability in complex electronic equipment.
Waiting six, nine or even 12 months for a reliability problem to be
reflected in traditional AFR metrics represents a huge delay in
solving the root cause of the problem. In the mean time, shipments
of the problem continue thus increasing the installed base and associated exposure to higher warranty costs, greater customer dissatisfaction and lost future sales. Additionally, it is costly and frustrating to wait long periods of time to determine if a recentlyimplemented fix was actually successful. Metrics such as AFR are
slow to reflect the effectiveness of such a fix, and several months
of patiently monitoring the AFR may give way to making costly,
unnecessary investments in additional reliability improvements.
The iFR Method provides timely, valuable feedback to manufacturers thus enabling them to 1) quickly take action in response to
degradation in product reliability, and 2) avoid costly, unnecessary engineering changes when recent improvements are judged
to be effective and adequate.

References
1. Reliability Engineering Handbook, Volume 1, Dimitri
Kececioglu, Prentice-Hall, 1991.
2. AFR:
Problems of Definition, Calculation and
Measurement in a Commercial Environment, J.G. Elerath,
Reliability and Maintainability Symposium Annual
Proceedings, January 24-27 2000, pp. 71-76.
3. IEEE Guide for Selecting and Using Reliability Predictions
Based on IEEE 1413, The Institute of Electrical and
Electronics Engineers Inc, 2003.
4. Applied Reliability, Second Edition, Paul A. Tobias and
David C. Trindade, CRC Press, 1995.
5. Practical Reliability Engineering, Fourth Edition, Patrick
D.T. OConnor, John Wiley & Sons, Inc., 2002.
6. Practical Considerations in Calculating Reliability of
Fielded Products, Bill Lycette, The Journal of the RAC,
Second Quarter 2005, pp. 1-6.

About the Author


Bill Lycette is a Senior Reliability Engineer with Agilent
Technologies. He has 25 years of engineering experience with
Hewlett-Packard and Agilent Technologies, including positions
in reliability and quality engineering, process engineering, and
manufacturing engineering of microelectronics, printed circuit
assemblies and system-level products. Mr. Lycette is an ASQ
Certified Reliability Engineer and Certified Quality Engineer.

The Journal of the System Reliability Center

From the Editor


How Did I Get Here!
Ive heard it said that life is a funny dog. It takes some twists and
turns that we can neither plan nor expect. So in looking back on
ones career, it is not surprising that we find our careers have
taken a twisty path.
I did not choose to be a reliability engineer that career path sort
of chose me. While I was a First Lieutenant in the United States
Air Force, stationed at Wright-Patterson AFB and working on
test bed modifications, I was centrally selected for the Systems
Engineering-Reliability Masters Degree program at the Air
Force Institute of Technology. At the time, I had never heard of
reliability and had planned to get my Masters Degree in the
same area as my Bachelors Degree Mechanical Engineering.
However, a Masters Degree at Government expense is nothing
to turn down. So I accepted the opportunity.
That was nearly 37 years ago. In those three plus decades, I have
come to be an avid and enthusiastic supporter of the reliability
and related disciplines. Like my colleagues in the field, I know
that understanding the causes of failure and taking action to mitigate or prevent those failures, first through design changes and
later through improvements in production, operation, or maintenance, is vital to the defense of our country and the economic
vitality of our economy.
Like many of my colleagues, I also have come to realize that reliability is not consistently or even readily embraced by managers
and those holding the purse strings. In part, this reluctance to
accept reliability as a necessary performance parameter is due to
the probabilistic nature of reliability and our inability to forecast
precisely when a failure will occur or how many failures will
occur in a given interval of time.
Another reason reliability is not a top priority with managers is
that the benefits of a robust reliability program are long term and
not seen until after a product is sold or a system is fielded. The
costs, however, are immediate. Training, reliability software
tools, growth testing, redesign, data collection, root cause failure
analysis, and other essential reliability tasks require resources
today, not tomorrow.
Consequently, as a result of these two reasons and a third which I
will touch upon in a minute, much of my reliability career, and
those of many of my colleagues, has been spent in trying to sell
reliability to management. As a proponent, I have expounded on
the long-term benefits of reliability and the costs and risks incurred
by ignoring reliability. I have worked to educate managers and
engineers in the reliability discipline, trying to dispel the many
myths that surround the subject. I have had some success but not
nearly as much as I would have liked or I think is needed.

Those who know me well know my third


reason for this unsatisfactory level of
success in making reliability a part of
every program. They know that I lay the
blame on our societys penchant for
immediate gratification. Whether it be
adult stockholders or teenage children,
members of our society seem unable to
look much beyond tomorrow. Why save
for the future when we can spend today,
even if it means living on credit? Why
invest in our companies to help ensure
their future and that of the employees Ned H. Criscimagna
when we can make huge profits today to
keep the stockholders happy?
Reliability requires looking to the future, not dwelling on the present. Reliability requires that one recognizes that you must pay now
or pay even more later. Many a system has required considerable
investment of resources to achieve a reasonable level of availability
during its operational life, simply because little or no attention was
devoted to reliability during acquisition and production.
Perhaps some of our readers have similar stories and perceptions
regarding their career in reliability. And despite the times that
we have failed to convince managers to spend money now to
save money, reduce risks, and increase performance later, we
persevere. We persevere because we believe in what we are
doing. We persevere because we have seen the dire consequences of neglecting a basic part of system engineering. These
consequences are, unfortunately, not limited to higher costs.
They include loss of life, injury, unsuccessful military missions,
and gut-wrenching scenes of a space vehicle self-destructing in
the clear blue skies over Florida.
Many may regard the passion that reliability engineers bring to
their job as overdone or excessive. We are too focused, they
complain, and too zealous. Perhaps. But I think that everyone
whose life depends on the reliable operation of a system may see
us in a different light.
Reliability engineers may not receive the recognition or credit
given to those in other branches of engineering. However,
recognition and credit are not the reasons I have stayed in the
profession and I doubt they are what motivate my colleagues.
I may have started down the path of reliability engineering by
accident. But I have stayed the course on purpose. I have taken
and continue to take great pride in the work that my colleagues
and I do. I know how I got here and I know why I stayed.

The appearance of advertising in this publication does not constitute endorsement by SRC
of the products or services advertised.
First Quarter - 2006

27

The Journal of the System Reliability Center

Future Events in Reliability, Maintainability, and Supportability


2006 DoD Standardization Conference
May 23-25, 2006
Arlington, VA
Contact: Nancy Eiben
SAE International
SAE World Headquarters
400 Commonwealth Drive
Warrendale, PA 15096-0001
Tel: 724.772.8525
Fax: 724.776.1830
E-mail: <naneiben@sae.org>
On the Web: <http://www.sae.org/
events/dsp/>

2006 International Applied Reliability


Symposium
June 14-17, 2006
Orlando, FL
Contact: Patrick Hetherington
Alion System Reliability Center
201 Mill Street
Rome, NY 13440
Tel: 315.339.7084
Fax: 315.337.9932
E-mail: <Info@ARSymposium.org>
On the Web: <http://www.arsymposium.
org>

Safety 2006
June 11-14, 2006
Seattle, WA
Contact: The American Society of Safety
Engineers
Customer Service
1800 E Oakton Street
Des Plaines, IL 60018
Tel: 847.699.2929
Fax: 847.768.3434
E-mail: <customerservice@asse.org>
On the Web: <http://www.safety2006.org>

Human Factors in Defence


June 14-15, 2006
Contact: Horun Meah
SMi Group Ltd, Unit 009
Great Guildford Business Square
30 Great Guildford Street
London SE1 0HS
United Kingdom
Tel: 44.0.20.7827.6192
Fax: 44.0.20.7827.6001
E-mail: <hmeah@smi-online.co.uk>
On the Web: <http://www.smi-online.co.
uk/events/overview.asp?is=1&ref=2348>

CMMI Technology Conference & Users


Group
November 13-16, 2006
Denver, CO
Contact: Emily Brown
NDIA
Arlington, VA 22201-3061
Tel: 703.247.9476
Fax: 703.522.1885
E-mail: <ebrown@ndia.org>
On the Web: <http://www.ndia.org/
Template.cfm?Section=7110&
Template=/ContentManagement/
ContentDisplay.cfm&ContentID=10838>

For a complete listing of upcoming


RMS events, click on
the following link:
<http://src.alionscience.com/
calendar>

Das könnte Ihnen auch gefallen