You are on page 1of 10

Machine Learning Model for

Predicting Asset Failure

A SpaceTime Insight White Paper


Table of Contents
Introduction .............................................................................................................................. 3

State Transitions ...................................................................................................................... 4

Machines vs. Humans: Does Experience Matter? ................................................................. 5

Prediction in Two Temporal Dimensions ............................................................................... 6

Seeing Over the Hill .................................................................................................................. 7

Risk Tolerance and Optimizing the Prediction ...................................................................... 8

Conclusion ................................................................................................................................ 9

© 2017 Space Time Insight, Inc. 2


Introduction
Asset-intensive industries are intrigued by what predictive maintenance software
offers to help them determine when to repair or replace their equipment. Until
recently, the options were to run something until it broke or replace it on a
schedule (i.e. planned maintenance). The former means maximizing the useful
life of the asset, but accepting outages, operating downtime, and operating
inefficiencies from unexpected and emergency-level repairs. The latter means
increased capital expenditures and operating costs.
Condition-Based Maintenance
Today, there are many solutions labeled “predictive maintenance” purporting to predict when
an asset will fail and recommend repair or replacement. Most of these solutions use
condition-based maintenance (“CBM”) models that base their predictions for a given asset
on how its condition compares to statistical models for that class of asset. Some, including
applications from SpaceTime Insight, use machine learning models to analyze the behavior
and condition of each individual unit and predict whether and how it will degrade in the
future, based on the model’s understanding of how that type of asset has degraded in the
past. The distinction may seem subtle, but it has significant consequences, which we will
explore in this white paper. Specifically, we will explain how SpaceTime Insight’s
unsupervised machine learning engine predicts the probability of asset failure at times in the
future and how it overcomes the limitations of most CBM models.

© 2017 Space Time Insight, Inc. 3


State Transitions
CBM is based on monitoring a set of Unsupervised machine learning takes, in effect,
conditions for which acceptable parameters or the opposite approach. Programmers can guide
patterns have been set, and alerting operators the model based on domain expertise to get it
when conditions exceed those parameters. started looking for machine states that lead to
This assumes that the conditions that predict failure, but the model must learn on its own, from
failure are well-understood and the parameters the data, what patterns emerge that characterize
or patterns that may indicate imminent failure the state of the machine and ultimately lead to a
are well-established. To the extent some CBM failure state. Figure 1 is a visual representation of
systems use machine learning, they may use a the state transition model created by our machine
form of supervised learning – the possible learning engine for a specific type of asset. It has
failure outcomes are considered to be known discovered fourteen different states, labeled A
and the model is trained to monitor for the through N, that are relevant to the probability of
specific conditions or parameters determined that asset type failing. It does this by analyzing
to predict failure. The conditions are based on data from the asset type, including assets that
experience and the parameters are based on have failed.
statistical averages.
The more types of data the model can analyze,
Figure 1 Unfortunately, this is insufficient for true and the more likely it is that it will learn the states
predictive modeling. If all the conditions that that matter along the degradation path, the better
lead to failure were well known and followed the model will be. If it has some such relevant
neat paths to failure, engineers would have data streams, however, it can create a model.
Unsupervised Learning
designed these flaws out of the asset long Historical data is useful, as the model can learn
ago. Further, predictions based on supervised and observe failures without waiting for assets to
A type of machine learning algorithm used to draw inferences
from datasets consisting of input data without labeled responses. learning apply only to a specific asset model. A fail in the field, but not necessary. The model
The most common unsupervised learning method is cluster anal-
ysis, which is used for exploratory data analysis to find hidden different asset from a different vendor or even continues to learn as it continues to receive
patterns or grouping in data.
a different model from the same vendor would current data.
Source: Mathworks
have unique combinations of conditions and
parameters to monitor.
© 2017 Space Time Insight, Inc. 4
Machines vs. Humans: Does Experience Matter?
These discovered machine states may or Looking at Figure 1 again, the lines represent a
may not be understandable by even the most path of transition from one state to another
experienced operator or engineer. In any state. The thickness of the line represents the
case, some of them will very rarely be probability of that transition occurring. You can
observed, and we gain the most by predicting see that the path to failure usually does not
rare events. It is here where one clear point follow a simple linear progression. An asset
If we are only monitoring of difference with CBM arises: if we are only might fail from any of a variety of prior states.
monitoring conditions that engineers and Nor does an asset progress inexorably toward
conditions that engineers
operators know about and understand, we failure; it may transition from one state to
and operators know about are likely not monitoring all the conditions another and then return to a prior state. The
and understand, we are that matter. asset in Figure 1 is as likely to fail from state B
likely not monitoring all the Machine learning discerns from the data the as it is from state M.
conditions that matter. significant transitions on the way to failure SpaceTime’s machine learning calculates a
and calculates the probabilities of the asset specific asset’s probability of failure because it
transitioning from one state to another along understands these probabilities, having
a path that leads to failure. These states may modeled this asset type using all the data at its
be observable by humans or they may not. disposal. Ultimately, the failure probability is
Typically, the states are hidden, and some of based on the model’s observation of the
them are counterintuitive to an engineer and specific asset and its actual state transitions.
revealed only by the data.

© 2017 Space Time Insight, Inc. 5


Prediction in Two Temporal Dimensions
Another critical advantage over CBM that go through, we can predict the probability of Let’s look at a three-dimensional model of those
emerges from this model is the ability to not just it taking a path to failure within the next sixty predictions, shown in Figure 2. The height of the
predict the current probability of failure, but to days, but we can also tell you that since we curve is the probability of failure. There are two
understand what that prediction would be at a know the probable path it will take over time, time axes: number of days into the future of the
certain time in the future. CBM can say “based if we make the same prediction three months prediction, and the days into the future that a
on historical averages, assets in this condition from now, the probability of it failing within prediction is made. The 0,0 point, therefore, is
fail, on average, within sixty days.” Our machine sixty days from then will be different.” “now.” This asset right now does not indicate a
learning model, however, can say “based on the And that, as they say, makes all the probability of failing in the next 60 days. If we
state transitions we’ve seen this particular asset difference. were to make that prediction six weeks from
now, we would predict that the probability of
failure going forward 60 days is much higher. If
we continue pushing forward in time, we can
see that the probability of failure comes back
down. Looking further into the future, the
probability once again spikes. Clearly this asset
has a problem, and at some point, it will likely
fail. The trick is knowing the best time to repair
or replace it.
Let’s look at this in a simpler way.

Figure 2

© 2017 Space Time Insight, Inc. 6


Seeing Over the Hill
CBM and less sophisticated models are
parameterized and suffer reduced statistical
confidence as they try to predict forward in time,
because they aren’t looking at a degradation
path, they are looking at current conditions.
They can only predict, based on averages, the
likelihood of failure under those conditions. As
they extrapolate what the asset’s future
condition might be further into the future the
failure prediction will have even less confidence.

As a result, they are likely to assume an asset is Figure 3

failing and recommend replacement, unnecessarily


lowering the asset lifespan and increasing the cost
of repair or replacement when parts have to be rush ordered or crew schedules disrupted. have a time window to work with, so the asset

The machine learning analytics modeled in owners can optimize purchasing, spare parts
Reinforcement Learning location, crew schedules, down times, and other
Figure 2 predicts the probability of asset
Reinforcement learning involves learning what to do — how to map maintenance activities. The result is extended
situations to actions — to maximize a numerical reward signal. The failure at multiple times in the future without
three most important distinguishing features of reinforcement asset life, increased return on invested capital,
learning are: 1) being closed-loop in an essential way; 2) not hav- loss of confidence. As shown in Figure 3, it
ing direct instructions of what actions to take; and 3) playing out and lower maintenance and operating costs.
the consequences of actions, including reward signals, over ex- “sees over the hill” and predicts the
tended time periods.
Source: Reinforcement Learning: An Introduction (Second Edition, in progress, probability of surviving event A but not event
2016), Rich Sutton and Andrew Barto, MIT Press.
B. Reinforcement learning algorithms then
© 2017 Space Time Insight, Inc. 7
Risk Tolerance and Optimizing the Prediction
The graph in Figure 4 shows actual failure thought of as a function of the cost to the against other costs such as the cost of capital
prediction curves for disk drives in a data center. business of the asset failing balanced used to deploy that asset.
The graph compares actual condition-based
predictive models from the manufacturers with
an average of the predictions from our machine
learning model (the blue line). This graph shows
averages for a set of assets, to create an apples-
to-apples comparison with these other models.
In practice, our machine learning model predicts
the probability of failure for each asset
individually.
In an operating environment, risk tolerance is an
important constraint. Any predictive
maintenance plan should minimize risk: it could
simply predict that every asset will fail one week
after it is put into service, and if replaced
accordingly, you’ll never have a part failure.
However, that’s no way to run a business since
asset utilization will be very low. You need a
model with a better fit, one that allows you to Figure 4

operate the asset as long as possible before the This graph may be recognizable as an illustration of Gini coefficients. Gini coefficients are a measure of statistical dispersion, often used to describe
income inequality in a population. They can, however, be used to represent continuous probability distribution and effectively illustrate how well the
risk of failure is actually too high. Risk might be various predictive models of asset failure balance the cost of failure against the cost of prematurely replacing the asset.

© 2017 Space Time Insight, Inc. 8


The graph in Figure 4 illustrates this
balancing. Extending the operating hours is,
in effect, lowering the cost of capital and
other costs associated with purchasing and
deploying that asset. The ideal curve would
be one that predicts all or nearly all the asset
failures in a narrow band as far to the right as
possible, representing the longest operating Conclusion
lifetimes while still predicting most asset
Models that predict the probability of asset failure are the foundation of a predic-
failures before they happen. The least
desirable curve is one that equally distributes tive maintenance system that reduces business costs associated both with unex-
failure predictions along the operating hours pected asset failures and with premature replacement of operating assets. While
axis. condition-based maintenance is an improvement over “wait until it breaks” on one
In Figure 4, the red line comes close to extreme or overly conservative planned maintenance schedules on the other, for
representing the latter, and the blue line, many types of assets, CBM lacks the analytical strength to create models that op-
representing SpaceTime’s machine learning timize business outcomes. Machine learning can create such models, and as
model, comes closest to representing the such should be a requirement for choosing any predictive maintenance solution.
former. The other lines are the CBM
predictions delivered by the disk
manufacturers. All of the models have
access to the same data. Our model best
makes the trade-off between risk tolerance
and operating hours.

© 2017 Space Time Insight, Inc. 9


About SpaceTime Insight

SpaceTime Insight enables organizations in asset-intensive industries to


generate more value from their people, processes, and assets. Our award-
winning analytics and industrial internet of things applications optimize
operations in motion, in context and in real time. Teams at some of the
largest organizations in the world, including transportation and energy firms
and some of the world’s largest utilities, use SpaceTime Insight software to
power mission-critical systems. SpaceTime is headquartered in San Mateo,
CA with offices in Canada, UK, India, and Japan.

1850 Gateway Dr., Suite 125


San Mateo, CA 94404 USA

650.513.8550

www.spacetimeinsight.com

@spacetimeinsght

linkedin.com/company/space-time-insight

© 2017 Space Time Insight, Inc. 10