Sie sind auf Seite 1von 12

One Meter to Find Them All - Water Network Leak

Localization Using a Single Flow Meter

Iyswarya Narayanan∗ , Arunchandar Vasan† , Venkatesh Sarangan† and Anand Sivasubramaniam∗


∗ Departmentof Computer Science and Engineering
The Pennsylvania State University, University Park, PA 16802, USA
Email: {iun106, anand}@cse.psu.edu
† Innovation Labs, Tata Consultancy Services,

IIT-M Research Park, Chennai 600113, India.


Email: {arun.vasan, venkatesh.sarangan}@tcs.com

Abstract—Leak localization is a major issue faced by water a widely deployed monitoring infrastructure with real-time
utilities worldwide. Leaks are ideally detected and localized by a data logging from high-resolution pressure, flow, and acoustic
network-wide metering infrastructure. However, in many utilities, sensors. If that were the case, all one has to do is successively
in-network metering is minimally present at just the inlets of sub- balance flows across junctions to identify the lossy links for
networks called District Metering Area (DMA). We consider the even minor leaks. Leaks can further be localized to specific
problem of leak localization using data from a single flow meter
placed at the inlet of a DMA. We use standard time-series based
segments along a link through such infrastructure. Indeed, this
modeling to detect if a current meter reading is a leak or not, and is the case for networks with high monetary value such as oil
if so, to estimate the excess flow. Conventional approaches use and gas.
an a-priori fully calibrated hydraulic model to map the excess
Present status of instrumentation: Water, on the other hand,
flow back to a set of candidate leak locations. However, obtaining
an accurate hydraulic model is expensive and hence, beyond the is a relatively inexpensive resource. Consequently, most water
reach of many water utilities. networks are metered only to the extent allowed by the budgets
of utilities. In the absence of any metering at all, there is little a
We present an alternate approach that exploits the network utility can do about leak detection except respond to customer
structure and static properties in a novel way. Specifically, we complaints or abnormal consumption at head-works.
extend the use of centrality metrics to infrastructure domains and
use these metrics to map from the excess leak flow to the candidate Our interactions with water utilities in Europe and Asia
leak location(s). We evaluate our approach on benchmark water reveal that a minimal metering infrastructure does exist in
utility network topologies as well as on real data obtained from an practice. Many utilities divide their water distribution network
European water utility. On benchmark topologies, the localization
into administrative domains called District Metering Areas
obtained by our method is comparable to that obtained from a
complete hydraulic model. On a real-world network, we were able (DMAs)1 . These DMAs are typically metered at an aggregate
to localize two out of the three leaks whose data we had access to. level at the inlet and occasionally, at the outlet. Most flow
Of these two cases, we find that the actual leak location was in the meters are sampled at a low frequency (e.g., once every 15
candidate set identified by our approach; further, the approach minutes.) If the sensed flow value goes beyond a prefixed
pruned as much as 78% of the DMA locations, indicating a high threshold, a leak is flagged.
degree of localization.
Current localization practices: Post leak detection, a utility
I. I NTRODUCTION has to localize the leak so that it can be fixed by field personnel.
One approach for leak localization is to use acoustic methods.
Importance of leaks: Non-revenue Water (NRW), the differ- In these methods, human operators walk along a pipeline
ence between the volume of water entering a utility network network with appropriate instruments listening for variation
and the volume billed to consumers, is a major issue facing in the reflected acoustic signal. Clearly, doing this along the
water utilities. The estimated world-wide loss due to NRW is entire length of the network is expensive. Another approach
$14.1 billion per annum. NRW levels (as a percentage of water adopted by utilities is to use a calibrated hydraulic model
supplied) are 15% in the developed world, and 40-50% in the to localize the leak. Calibrated hydraulic models supplement
developing world [1]. Nearly 60% of NRW in the developing meter data by capturing the dynamics of the network flow,
world is due to physical losses (as opposed to faulty metering and help gain insight about network operations. For instance,
and theft), i.e., treated water is lost in the network due to a calibrated model of a single pipe would predict the drop in
leaks and overflows at storage tanks. Reducing this loss by water pressure across the pipe as a function of the flow using
half will allow the utilities to serve an additional 100 million an experimentally determined roughness coefficient.
people with no further withdrawals from natural sources. This
loss reduction will also allow the utilities to save the energy To localize a leak, the hydraulic model of the DMA under
associated with treating and pumping roughly 8 billion cubic consideration is run with several choices of leak locations. The
metres of water. To reduce NRW levels, utilities need to detect choice whose simulation results closely match the inlet and
and localize leaks in their distribution networks efficiently.
1 In some sense, the DMAs can be considered analogous to the autonomous
Leak detection and localization are ideally done using systems (ASes) in the Internet.

978-1-4799-3146-0/14/$31.00 ©2014 IEEE 47


outlet meter data is identified as the leak location. Although tanks) as opposed to calibrated information such as roughness
this appears plausible, as we explain later in Section II, coefficients and nodal demands. This database is parametrized
obtaining a well-calibrated network hydraulic model is quite by the operating conditions existing at all possible time instants
expensive. Therefore, this too is beyond the reach of many at which a burst can happen. The set of probable leak locations
utilities. are identified as those whose estimated burst signatures match
closely with the actual leak signature.
Proposed approach: While several leak detection and local-
ization approaches have been discussed in the literature, they Specific contributions of our work include:
require either a widely deployed, high resolution monitoring
• A statistically-based filter based on standard ARIMA
infrastructure or a calibrated hydraulic model [2], [3]. Our
model that extracts the leak signature from the flow
discussions with water utilities across different geographies
values.
indicate that neither of these is commonly available. Motivated
by this reality, we consider the following question: “Given a • An empirical analysis of the possible leak signatures
water distribution network with no calibrated hydraulic model that can be obtained from a single flow meter placed
and low-frequency flow logging at a DMA’s inlet, can we detect at a DMA’s inlet. We find that the number of dis-
and localize major leaks ?” Note that we consider the case of tinguishable leak signatures can be considerably less
a flow meter being present only at the inlet because this is the than the number of junctions in the DMA. Specifically,
reality in most networks. Further, real world water utilities are we show that the number of distinguishable leak
constrained by their budget and may not have the means to signatures is dependent on the number of strongly
deploy additional sensors. connected components in the network defined in terms
of hydraulic resistance. Our analysis identifies the
We answer this question in the affirmative by proposing extent to which a leak can be localized with a single
methods for detecting and localizing major leaks. We focus flow sensor using any method.
on scenarios where a water network has only one major leak
at any given time. Real-world data on the duration between • A current-flow centrality based approach that exploits
reported leak events (presented in Section VII) confirms this. a water network’s structure to build a database of
Our discussions with water utilities indicate that this is the burst signatures of different network locations even
most commonly occurring event. Though several sections of a in the absence of hydraulic models. We find that the
network could be prone for a major leak, usually the weakest leak signatures obtained through our method are as
segment in the network gives way first. This precludes other distinguishable as those obtained by a fully calibrated
weaker sections of the network from caving in. Multiple major hydraulic model with accurate and complete informa-
leaks rarely occur unless there is a geography specific event tion.
such as an earthquake.
We validate our approach on synthetic benchmark water utility
For ease of reading, henceforth, we refer to major water networks as well on real world flow meter/burst data obtained
leaks as simply leaks in the rest of this paper; we also refer from an European water utility. We find that our localization
to them as bursts. Our leak detection method considers the method performs well on both real-world and benchmark
past history of flow meter values over a time window and networks. On benchmark topologies, the performance of our
compares the current sample against a prediction made for the method is comparable to that of those obtained from expensive
current time step. If the difference is positive and statistically hydraulic models. On a real world network, we were able to
significant, a leak event is flagged with some confidence localize two out of the three leaks whose data we had access
interval. The additional flow due to the leak is also extracted to. On these two cases, we find that the actual leak location
as the leak signature which is then used for localization. was in the candidate set identified by our approach; further,
our approach pruned as much as 78% of the DMA locations,
Leak localization is more challenging than detection for indicating a high degree of localization.
the following reasons. (i) Relatively speaking, bursts are rare
events and hence, it is difficult to obtain a database of burst To the best of our knowledge, ours is one of the first
locations and their signatures from past history2 . (ii) Even if few works to localize leaks in a utility scale water network
one manages to obtain such a database, the burst signature using information from a single flow sensor in the absence
for the same location would vary depending on the network of a hydraulic model. We believe that the our methodology is
operating conditions. (iii) As there is only one measurement generic enough to be applied for any resource-flow infrastruc-
point (i.e., the inlet flow meter), it is likely that the signatures ture domain, especially when accurate models are not available
of several locations are similar. to simulate and understand the infrastructure behavior.
We address these challenges in the following way. Our leak The rest of this paper is organized as follows. Section
localization approach adopts graph centrality metrics to build II briefly provides the background for understanding water
an a priori synthetic database of burst signatures of all possible networks. Section III discusses our approach for leak detection.
locations within a DMA. To generate the database, we exploit Section IV describes how the network structure can affect
the static structural data of the water network (e.g., records the degree to which a leak can be localized. Section V
about the network such as connectivity, diameters and lengths discusses our localization methodology. Sections VI and VII
of pipes, rated power of pumps, and height and capacity of present results on how the proposed leak localization approach
performs on benchmark and real world topologies respectively.
2 Since hydraulic models may not be available with all utilities, the burst Section VIII discusses related work and Section IX concludes
signatures cannot be synthetically generated. the paper.

48
II. WATER D ISTRIBUTION N ETWORKS In most utility water networks across the globe, customers
are not precisely metered at end-points. Demands are accu-
Symbol Meaning
G Water network
rately measured only at aggregated levels. Therefore, node-
E Pipes (edges, links) level demands (di ’s) may not be accurately known at the
V Junctions (vertices, nodes) required spatio-temporal granularity. Further, as pipes age
A Weighted adjacency matrix of G
di Demand at node i and corrode, the roughness of their internal surfaces changes
pi Pressure at node i which affects the amount of head loss along the pipes. The
Hi Total head at node i roughness of a pipe, quantified by the roughness co-efficient
hi Elevation at node i
Ci,j Head gain due to pump along pipe (i, j) (μe ), needs to be periodically estimated. These co-efficients
fi,j Flow from i to j can be estimated only when actual values of pressure and
fburst Excess flow through a meter due to burst flows at various junctions and pipes are available. Finally,
e Edge (pipe) e = (i, j)
Le Length of pipe e pump efficiencies affect the hydraulic model’s accuracy. Aging
Re Radius of pipe e pumps deviate significantly from manufacturer specifications.
μe Roughness coefficient of pipe e
ρ Density of water (1 gm/cc) Considerable effort is required in terms of field trials and
g Acceleration due to gravity 9.8 m/s2
parameter fitting to estimate the μe ’s and pump efficiencies.
TABLE I. N OTATION FREQUENTLY USED IN THE PAPER For example, calibrating a network with few hundreds of
nodes may take 40-60 days of effort by an expert team of
Water networks distribute water from reservoirs through
2-4 members [5]. Larger networks require even more effort. A
pipes (or links) to various locations in a municipality. One
utility may not have in-house expertise to do such calibration,
or more tanks function as intermediate buffers within the
thus requiring expensive external consultants. For a network
network. Pumps boost the ‘head’ of water to either push water
serving a population of 1 million, hydraulic model maintenance
uphill or increase the delivery pressure of water at end points.
over a five year period typically costs about $4, 000, 000 [6].
Valves regulate the water pressure at network nodes and to
Even approximate models may not be available in developing
open/close pipes for water flow. Junctions or nodes are points
economies.
in the network where different pipes meet. They also represent
end consumers and hence can have demands. The demands In our work, we employ centrality metrics and exploit
typically vary with the time of day. For a given demand pattern, the readily available network information to allow utilities
the flow values along various pipes in the network is decided to expedite burst localization, even in the absence of data
by the underlying topology and Physical laws. on the μe ’s and di ’s.
Typically, the demand values at all nodes in the network, Our approach consists of two parts. The first part focuses
the reservoirs’ capacities and pressures, and the rating of all on detecting a leak and extracting its signature; the second
pumps in a water network are assumed to be known. The leverages the leak signature and the network structure to
pressures at all nodes in the network and the flows through all localize the leak.
pipes in the network are to be determined from these inputs.
Using the notation in Table I, the equations that relate the III. L EAK D ETECTION
desired outputs with the inputs can be stated as follows [4]:
We now address the question: Given a time-series data for
• Flow equation: The total flow entering a node i from
the inlet flow-loggers of a DMA, can we (i) detect a leak in
all its neighbors Ni is equal to the total flow leaving
the DMA with a confidence? (ii) can we estimate the excess
it plus the demand di of the node i. In other words,
 flow due to the leak? The estimated burst flow will be used as
∀i ∈ V, di + f (i, j) = 0 (1) a signature by the localization module. Note that by detection,
j∈Ni
we mean identifying that a leak has happened after it has
happened.
• Head equation: For any link e = (i, j) ∈ E, the
Let Xt denotes the time-series of flow measurements seen
change in the head along the link can be written as:
at a meter at the inlet of a DMA. Let Lt ∈ {0, 1} denote the
Le fe2 × ρg presence of a leak at time t within this DMA. Our goal is to:
pi + hi ρg + Ci,j − = pj + hj ρg (2) (i) design a mechanism to decide for each time instant t, if a
μe Re5
leak is present, i.e., if Lt = 1, and (ii) quantify the associated
where μe represents the pipes roughness coefficient. false positive rate. The false negative rate may not be obtained
The above equation states that the total pressure gain unless the distribution of the actual leak flow is known.
along any closed loop in the flow network is zero.
This is basically a generalized version of Bernoulli’s Nature of the flow signal: Figure 1(a) shows a sample of
theorem to include frictional losses. the actual flow values observed at the inlet of a DMA in an
European water utility network. The data corresponds to the
A. Challenges in hydraulic model calibration flow observed over a period of 30 days, with the values being
sampled every 15 minutes. From the raw flow values, one
From Equations 1 and 2, it can be observed that apart from can readily notice the non-stationarity. The auto-correlation
the inputs (which includes di ’s), the values of μe ’s should also function (ACF) plot of the raw signal in Figure 1(b) indicates
be known to determine the pressures and flows. The di ’s and the presence of a daily trend since there are sharp peaks at a
μe ’s are empirically estimated during the process of hydraulic lag of 24 hrs (96 samples). When the daily trend is subtracted
model calibration. out, the ACF plot of the resulting signal in Figure 1(c) shows

49
Time Series − Flow ACF plot − showing daily seasonality ACF plot − showing weekly seasonality

1.0
1.0
8

0.8
0.6
0.5
7
Flow m^3/hr

0.4
ACF
ACF

0.2
6

0.0

0.0
−0.2
5

−0.5
0 500 1000 1500 2000 2500
0 500 1000 1500 2000 2500
4

0 500 1000 1500 2000 2500

Sample Index

(a) Raw flow data. The observed signal is non- (b) The flow values show a daily trend. (c) We see a weekly trend as well.
stationary.
Fig. 1. Characteristics of flow values measured by a water utility flow meter.

what is potentially a weekly trend with peaks at a lag of 672 2(c) shows the Q-Q plot showing the various quantiles of our
samples. model against the quantiles obtained from actual data. The
closeness of the resulting plot to the 45 degree line indicates
A naive approach for leak detection: The highest seasonality
high accuracy of our model.
trend one could observe in the flow values is over a week or
4 × 24 × 7 = 672 samples. A naive approach to leak detection
could be as follows. One could empirically build distributions C. Leak Detection
of P[Xt ], for all t mod 672. If a flow sample, say Xk , shows
Given the AR model for predicting the flow Xt from the
significant deviation from the mean value according to the
past history, we can obtain the a priori conditional distribution
distribution of Xk mod 672 , it could be flagged as a leak.
P[Xt |Xt ,t ≤t ]. Now, if the flow logger has obtained a sample
The main drawback with this approach is that it does not X̂t , we want to estimate if the sample is due to a leak being
take in account the local trend variations. For instance, Figure present at t or not. Consider a small number , 0 <  ≤
2(a) gives the distributions of Xt and Xt+1 obtained from 0.01. Let x() be the (1 − )th percentile of the distribution
real world flow traces. The probability distribution of Xt+1 P[Xt |Xt ,t ≤t ], i.e., P[Xt |Xt ,t ≤t > x()] = 1 − . Then we
given the fact that Xt is high may not be the same as the flag a leak at time t if and only if the sample X̂t > x(). Note
original distribution of Xt+1 . Figure 2(a) gives the distribution that this leak could have started sometime before t.
of Xt+1 |Xt > β, for some β. As can be seen, this conditional
distribution is quite different from the original distribution. A A sample corresponding to normal operation falling in the
sample value that has significant deviation as per the original upper tail of the distribution P[Xt |Xt ,t ≤t ] between [1 − , 1]
distribution of Xt+1 would be considered normal under the would be flagged as a leak. Therefore, the false positive rate of
conditional distribution. Consequently, the naive approach can our approach is precisely . Thus we have a method that can
result in too many false positives. allow the user to tune the detection of leaks using a threshold
x() on the basis of an acceptable false-positive rate . Further,
A. Model Building the least-squared error estimate of the excess flow that happens
because of a leak observed at time t is given by δ(t) = X̂t −
We now statistically model the flow values observed at E[Xt ], where E[.] is the expectation operator.
one meter. We use this model to predict the flow value at
the next time step. If the model is good, the predicted and Consecutive leak alerts: A leak can take several hours to fix.
actual flow values will be close enough. If there are any Consequently, the flow logger would see values that deviate
abnormalities in the network (such as leak, meter failure, etc.), from the expected distribution across several samples. Suppose
the predicted and actual values will significantly differ. If the the thresholds that we choose for a sequence of n samples are
actual value is significantly higher than the predicted value, 1 , 2 , . . . , n , for some n. Then the probability of a series of
we flag a leak. We first difference the signal to remove the n leak alerts being a false positive, α, is given by
daily and weekly trends. We then fit an auto-regressive (AR)    
α = P[ Xi > x(i ) ∩ Xi+1 > x(i+1 ) · · · (3)
model of appropriate order to handle the remaining errors.  
Once the prediction is made using this auto-regressive model, ∩ Xi+n > x(i+n ) ]
we estimate the flow sample value through anti-differencing.
This model captures the local dependencies better than the Clearly, if a leak is absent, the events Xi > i for consecutive
naive approach and hence is more accurate. i’s are positively correlated and their joint probability would
be higher than the product of their marginals. However, when
Xi is flagged as a leak, we do not use the sample to update
B. Model Validation
the ARIMA model; instead we (recursively) use a sample that
Figure 2(b) shows the time-series generated by our model is from the past corresponding to this time-of-day 24 hours
versus the actual samples from a real-world meter. The average ago under normal operations. Therefore, the events Xi > i
relative error in prediction during normal operation is within for consecutive i’s become independent in the presence of a
9.95%. This 9.95% error is reasonable considering the fact that leak. Consequently, the overall false positive rate of flagging
it is within the 5% error of the flow meter’s resolution. Figure a leak under a sequence of n consecutive individual alerts can

50
Local history
12 Q−Q plot
Observed
11 Predicted(+2 units)

1.5

8
X_t
10

7
X_{t+1}, 98th percentile=6.8

Flow m3/hr
9

1.0

6
X_{t+1}| X_t > 6.47, 98th percentile=7.7

observed
Density 8

5
7
0.5

4
6
5

3
4
0.0

2
4 5 6 7 8 9 0 50 100 150 200 2 3 4 5 6 7 8
Flow Time predicted

(a) Naive approaches may not model the con- (b) Observed vs. Predicted Flow. (c) Q-Q plot. The plot lies close to the 45◦
textual information. line which is an indication of good fit.
Fig. 2. Characteristics of naive approach and proposed approach

be obtained as in the network. The flows δ1 (t), δ2 (t), · · · , δm (t) are said to
α = P[Xi > x(i )] × . . . × P[Xi+n > x(i+n )] be indistinguishable from each other, if |δi (t) − δj (t)| < ε,
(4) ∀ i, j ∈ U , for some ε > 0. Typically, the value of ε depends
= i × i+1 . . . × i+n . on the the resolution of the inlet flow meter and the variations
Clearly α decreases with increasing n. expected in the network operating condition at time t.
While we have used an ARIMA model based deviation Consider a simple chain network R-ABCD shown in Figure
approach to detect leak values for simplicity, we note that any 3. Let R be the reservoir and the rest be the demand nodes.
standard outlier detection techniques for time series can be Assume that the links AB, BC, and CD have a very low
used [7]. hydraulic resistance (or a very high hydraulic admittance).
Though the four nodes are spatially separated, in hydraulic
IV. E FFECT OF N ETWORK STRUCTURE terms, they can be considered as a ‘single’ node. Therefore, if
a burst happens at any of A, B, C, or D, then the short circuit
A burst is analogous to an electrical “short circuit”. Specif- hydraulic resistance offered by any of these locations to the
ically, water under high (internal) pressure tries to escape to reservoir R would be comparable. Consequently, δA , δB , δC ,
a lower external pressure. The amount that escapes is only and δD may not be distinguishable for any t. In general, two
limited by the (analogous to electrical) hydraulic resistances nodes directly connected by a high admittance link are likely
to the excess flow offered by the network on the path from to have indistinguishable burst flows.
the reservoir. Formally, the hydraulic resistance of a link is
defined as the ratio of the head drop across a link to the flow Guided by this observation, we now systematically analyze
through the link. Hydraulic resistance of a path is the sum of nodes within a DMA that are connected by high admittance
the individual resistances of the links along the path. links. As per equation 2, a link’s admittance per unit flow
R5
The excess flow due to the burst ultimately gets supplied to equals Lee μe . Given a DMA, we compute the unit flow
a DMA through its inlet and gets recorded. Clearly, the number admittance of all the links. We represent the DMA as a digraph
of distinguishable burst flows that can get recorded is equal to G = (V, E) with the direction of an edge being decided by
the number of unique short circuit hydraulic resistances offered the flow direction along that edge. Typically, water does not
by different locations within a DMA. circulate in loops in a network; therefore, G will be a directed
acyclic graph (DAG).
In the following analyses, we assume that bursts happen
at nodes (i.e., junctions of pipes) in a DMA. This assumption We modify this DAG as follows. If the unit flow admittance
does not preclude bursts happening at pipes because, a pipe can of a DMA link is higher than some threshold value, we make
logically be divided into multiple segments by placing dummy the corresponding edge in the digraph as bidirectional. On this
nodes along its length. These dummy nodes can act as possible modified digraph, we determine the various strongly connected
burst locations. components, S1 , S2 , ..., Sk . Nodes in a strongly connected
component will be connected via high admittance links; hence
R A B C D their burst flows are likely to be indistinguishable.
Let γ > 0 be the number of distinguishable excess burst
flows possible across the nodes in the DMA. Higher the γ,
Excess Flow
Pressure

higher will be the extent to which a leak can be localized. On


an average, the extent of leak localization will be |V |/γ. Given
this, the following relationship can be written:
γ ≤ k + |V | − |{S1 ∪ S2 ∪ · · · Sk }| (5)
In other words, equation 5 establishes the upper bound for the
Fig. 3. Chain structure
number of distinguishable burst flows observable in a DMA’s
Definition. Let δi (t) be the excess burst flow at time t due to inlet flow meter. The k term in the bound arises from the
a burst at node i. Let U = {1, 2, · · · , m} be a set of m nodes fact that each of the strongly connected component may have

51
distinguishable burst flows. The other terms account for the B. Application to water networks
distinguishable burst flows from nodes that are not a part of
Equation 6 assumes that p and I are linearly related. In
any of the k strongly connected components.
the case of a water network, the hydraulic head at a node is
If the admittance of all the links in the DMA is very equivalent to the electric potential at a node and the water
high, then the entire DMA will be one strongly connected flow through pipes is equivalent to the current flow. Unlike
component making γ = 1. On the other hand, if the admittance an electrical network, the relation between head and flow is
of all the links is quite low, we have the degenerate case in non-linear as the head loss per unit flow depends on the flow
which each node will be a strongly connected component, value. We extend this current flow betweenness centrality
making γ = |V |. measure to water networks and use it to estimate burst
flows.
V. L EAK L OCALIZATION Let t be the time instant at which we want to determine the
The understanding gained thus far can be summarized as burst signatures of all the network locations. Let x̂(t) be the
follows: (i) the leak detection module can estimate the excess normal non-burst flow value expected at the DMA inlet at time
flow, δ(t), induced during a burst; (ii) the value of δ(t) depends t; this value can be obtained from the AR model developed in
on the short circuit hydraulic resistance of the burst location Section III. Let Γ be the set of all demand nodes in the DMA
at the time of burst. and s be the DMA inlet node. We now define the vector b
such that bs = x̂(t) and bi = −x̂(t)/|Γ|, ∀i ∈ Γ.
A well calibrated hydraulic model of a DMA can be used
to estimate the excess burst flow induced by various DMA The entries aij of matrix A are defined as:
locations at the time of leak alert. The location(s) whose  5
Re
estimated burst flow value(s) match with the observed burst aij = Le if e = (i, j) ∈ E (8)
flow would be the probable leak locations. As mentioned 0 if e = (i, j) ∈
/E
in Section II, obtaining a well calibrated hydraulic model is
expensive and beyond the reach of many utilities in developing where Re and Le are the radius and the length of the pipe
economies. The key question is in the absence of hydraulic e respectively. In other words, we initially populate matrix
models, can the excess burst flow from various locations be still A assuming unit flows and unit roughness coefficients for all
estimated? To answer this, we leverage ideas from centrality pipes. For these A and b vectors, we estimate the head vector
analysis of networks. Specifically, we use a customized variant H and flow vector f using Equation 6. These flow values are
of the current-flow betweenness centrality. used to revise the admittance matrix A through which newer
flow values are obtained. This fixed point iteration continues
until the flow values converge.
A. Current-flow centrality
The flow value obtained thus could be considered as a
Let G = (V, E) be an electrical network represented as variant of the current flow centrality metric defined earlier
an undirected graph. Each edge e = (i, j) ∈ E in this graph in Equation 7. The variations include modifications to the b
has an electrical resistance 1/aij and admittance aij . For e = vector, a non-linear relationship between A and b, and the
/ E, aij = 0. Let A be the weighted adjacency matrix
(i, j) ∈ absence of any averaging.
of G formed by the aij ’s. Let b : V →  be a vector that
indicates where the current externally gets injected or removed The fixed point iteration handles the non-linear relation-
from the network.  A node i is a source if bi > 0 and a sink ship between H and f . However, the assumption about the
if bi < 0. Clearly, i bi = 0. The current that gets injected at roughness coefficients of all the pipes being unity remains
the sources will split along the various incident edges, travel uncorrected. To address this issue, we derive an heuristic ap-
through various paths, before getting drained at the sinks. The proximation for the roughness coefficient μ, while continuing
current flow induces voltages or potentials at the nodes. Let p to assume that it remains identical for all pipes. We believe
and I be the vectors that indicate the vertex potentials and edge that this assumption is reasonable since there is homogeneity
currents. Now, by Kirchoff’s laws, p, b, and A are related as in the pipes within a DMA in terms of their age.
[8]: Water utilities typically ensure that the delivery head at any
LA .p = b (6) node is at least greater than or equal to a minimum acceptable
where LA is the Laplacian of A. Once p is known, I can be threshold value τ . In most utilities, τ = 15m (i.e., enough head
determined as Iij = (pi − pj )aij . to support a water column of 15m). Let z be the node with the
highest elevation in the DMA being analyzed. A conservative
Consider a unit current entering G at node s and leaving estimate for the delivery head at z is τ , i.e., Hz = τ . If Hs is
it at node q, i.e., bs = 1, bq = −1, bi = 0, ∀i ∈ / {s, q}. the head maintained at the DMA’s inlet, from equation 2, we
Current-flow centrality of a link quantifies the average amount can write the following:
of current that would pass through a link when all possible  
1 
s, q pairs in the network are considered. Formally, 2 Le
 Hz = Hs − f e 5 − hz = τ (9)
Ie (s, q) μ Re
e∈Psz
s,q∈V
C(e) = , ∀e ∈ E (7) where, Psz is the path from inlet s to z with the smallest head
(|V | − 1)(|V | − 2)
drop, hz is the elevation at z, and fe is the flow value across
where Ie (s, q) is the fraction of current flow from s to q edge e at the end of the fixed point iteration. From the above
passing through e. relation, μ can be calculated once Hs is specified. Once μ is

52
known, the estimated head values H can be corrected for the 2.5
Relative error for an edge

Relative error: (Δ Flow)/Flow


unity roughness coefficient assumption through a linear scaling 2
of LA in Equation 6.
1.5

C. Burst flow estimation at location v 1

0.5
Assume that the DMA inlet is at a given head Hs . When a
burst occurs inside a DMA at location v, the burst location is 0
0 5 10 15 20 25
exposed to the static head exerted on the pipe by the outside
Iterations
environment, e.g., the weight of the soil above the water pipe. (a) Convergence for one edge
Let this head be h0 . The value of h0 can be determined based 2
on a typical soil composition of the DMA service area, the Avg. relative error for all edges

Relative error: (Δ Flow)/Flow


1.8
pipe burying depth, and v’s elevation from mean sea level. 1.6
1.4
To find the burst flow at time t due to the burst at location 1.2

v, we set the demand at v as bv = Δ + x̂t /|Γ|, for some


1
0.8
Δ > 0. The demands at other nodes are left unchanged at 0.6
x̂t /|Γ|. For this new demand vector b, we recompute f and H 0.4
0.2
as described in Section V-B. If we find that Hv is greater than 0
h0 , we increase bv to a new value by increasing Δ. however, 0 5 10 15 20 25
Iterations
if Hv < h0 , we decrease bv and repeat the process. In other
(b) Avg. convergence for all edges
words, we search for that new demand bv at v such that the
Hv for this demand is comparable to h0 . Clearly, the value Fig. 4. Empirical results showing the convergence of the edge flow values
bv − x̂t /|Γ| is the excess flow that gets induced in the network under the iterative heuristic on a benchmark topology. Though the error in a
particular iteration may slightly increase for one edge, the average error across
due to the burst at v. Hence, bv − x̂t /|Γ| is v’s burst signature all the edges falls monotonically.
for the inlet head Hs .
Standard techniques such as binary search can be adopted The candidate set of leak locations at time t, ζt is defined
to find this value due to the monotonic relationship between as,
 
bv and Hv . Figure 5 gives a consolidated view of the above ζt = {v| ABS Δv (t, Hs ) − δ(t) ≤ λ, ∀Hs } (10)
discussions.
where ABS(x) is the absolute value of x and λ is a proximity
Remarks. The iterative method outlined in Figure 5 re- threshold. Ideally, the size of ζt should be as small as possible
cursively solves a coupled system of non-linear equations with the guarantee that the actual leak location will always be
in H and f . The A−1 matrix in C OMPUTE -H EAD -U SING - present in the candidate set. In our studies, we have set λ to
C ENTRALITY can be considered as a heuristic approximation be 5% of δ(t).
of using a correction that includes the Jacobian (∂H/∂f ). This
matrix is used to obtain the new head vector Hn+1 which in Typically, a leak will continue over several time in-
turn is used to obtain the new flow vector fn+1 . The new stants and hence we will have a series of candidate sets,
flow vector is then used to update the A−1 matrix. While ζt1 , ζt2 , · · · , ζtn . Information from these sets is fused and the
the convergence rate of the iteration can be increased through resultant high likelihood candidate set is chosen as:
the explicit use of the Jacobian, we find that our heuristic 
n

seems to perform satisfactorily in practice as evidenced from ζ = {v | max I(v, ζti )} (11)
v
the convergence rates shown in Figures 4(a) and 4(b). Figure i=1
4(a) shows the convergence for one edge while Figure 4(b)
shows the convergence averaged across all edges. where I(v, ζti ) is an indicator variable that equals 1 if v ∈ ζti
and equals 0 when v ∈ / ζti .

D. Localizing a leak Once the high likelihood candidate set is determined, the
actual leak location for carrying out the repair work can be
Recall that the leak detection module will flag a leak determined by manually scanning the pipeline in the vicinity of
when the flow sample observed at time t, X̂t , is greater than the candidate set with acoustic sensors. In case the actual leak
x(). The excess leak flow estimated at time t from actual location is not in the vicinity of the high likelihood candidate
measurement is given by δ(t) = X̂t − E[Xt ]. set, the entire DMA need not be searched. We can form a series
of candidate sets in the decreasing order of the set likelihoods
Let Δv (t, Hs ) = bv − x̂t /|Γ| be the burst signature to guide the scanning.
estimated for node v at time t for an inlet head value of Hs . If a
pressure logger has not been deployed at the DMA inlet along VI. E VALUATION ON B ENCHMARK T OPOLOGY
with the flow meter, then the actual value of Hs at time t may
not be known. In this case, Hs is considered as a parameter We now evaluate our approach on a benchmark topol-
and Δv (t, Hs ) is computed for different values of Hs . Since ogy available from the American Society of Civil Engineers
most DMAs are pressure regulated either through a pump or (ASCE). This topology was used for the Battle of Water
a water tank at the inlet, the range of values for Hs can be Calibration Networks (BWCN)[9]. The BWCN network has
gathered from the past DMA operations. a single reservoir, 7 storage tanks, 388 nodes, 429 links, 11

53
RESERVOIR
INLET
34 29 31
3 13
B UILD -S IGNATURE -TABLE(G, Hsrc ) 30 17 14
2
 v is a burst location 32
39 40 16 9 24
1 for ( v ∈ V ) 33
10 25
18 15 22
2 do burstSignature[v] ← 28 7 27
35
3 C OMPUTE -E XCESS -F LOW(G, v, Hsrc ) 41 23 37
26
5 20
19
4 6 0 21 36
C OMPUTE -E XCESS -F LOW(G,v, Hsrc ) 12 11 1 38
1 excessFlow ← 0 8
2 normalDemand ← G[v].demand
3 done ← False Fig. 6. Structure of DMA5 from the benchmark topology. This DMA has
4 while not done 43 nodes.
5 do headV ← C OMPUTE -H EAD(G, v)
6 if (headV > h0 )
pumps, 3 pressure reducing valves and one throttle control
7 then oldDemand = G[v].demand
8 G[v].demand ← 2 × oldDemand valve. The network is divided into 5 District Metering Areas
9 elseif (headV < h0 ) (DMAs). For the sake of brevity, we report results of our anal-
10 then avg ← (oldDemand + G[v].demand)/2 ysis for one of the DMAs, namely, DMA-5, whose topology
11 G[v].demand ← avg is given in Figure 6.
12 elseif (headV ≈ h0 )
13 then excessFlow ← G[v].demand - normalDemand Evaluation set-up. The BWCN network and the associated
14 G[v].demand ← normalDemand data do not have any information about burst events. Therefore,
15 done ← T RUE we simulated bursts using the hydraulic model and generated
16 return excessFlow the flow meter data at the inlet of DMA-5 required for our
analysis. To make the deterministic demand patterns of the
C OMPUTE -H EAD(G, v ) BWCN more realistic (as observed in our real-world DMA
 Previous iteration flows data), we added Gaussian noise of mean 0.3 units and standard
1 FP ← {φ} deviation 0.1 to all base nodal demands. To simulate bursts
 Current iteration flows at a node at a specific time, we added an “emitter” to that
2 FC ← {φ} node with an appropriate emitter coefficient. An emitter is a
3 iteration ← 0 water demand point where the water escapes into the outside
4 G ← copy(G) atmospheric pressure and is the standard model of choice for
5 error ← ∞ simulating a leak or a burst. The emitter coefficients were
6 while iteration < τI and error > τC randomly chosen in the range of 2 through 28 to simulate
7 do H ← C OMPUTE -H EAD -U SING -C ENTRALITY(G ) small to large bursts. Higher the emitter coefficient, larger is
8 error ← 0
the size of the burst. For each combination of node id and
9 correction ← {φ}
10 for (i,j) in G.edges time-of-leak, we ran 20 experiments.
11 do FC ← (H[i] − H[j]) × G [i, j].admittance
12 correction[i,j] ← α(FC [i, j] − FP [i, j]) A. Leak detection - Benchmark Topology
13 error ← error + |correction[i, j]|/FP [i, j]
14 error ← error/|G.edges| Figure 7 shows the average percentage detection. Specif-
15 iteration ← iteration + 1 ically, the X-axis shows the id of a node. For a given value
16 for (i,j) in G’.edges of the emitter coefficient (used to simulate the burst), the Y-
17 do G’[i,j].admittance ← axis shows the average percentage detection which is averaged
G[i,j].unitAdmittance
18 FP [i,j]+correction[i,j] over all bursts that occur at a node overall times-of-day. The
19 FP ← FC following observations can be made from the graphs. Above
20 Compute H[v] after linear scaling a certain emitter coefficient value (i.e., leaks above a certain
21 return H[v] size), the leak can be detected almost certainly for most nodes.
There are some nodes (e.g., nodes 1 and 10) for which the
C OMPUTE -H EAD -U SING -C ENTRALITY (G) detection likelihood is low for moderate emitter values and
1 A ← weightedLaplacianMatrix(G,weight=G.admittance) are high only for very high emitter coefficient values. This is
2 for t ∈ G.sinks because these nodes are at the periphery of the DMA (when
3  ← -G[t].demand
do B[t] seen from the inlet), at low pressure, and so, are likely to cause
4 B.append( t ∈ G.sinks G[t].demand) a significant excess flow only if the burst size at these nodes
5 H ← A−1 .B are very high.
6 return H
Figure 8 shows the average time to detect a leak from the
time the leak starts in the network at a node. Consider a given
Fig. 5. Procedural view of extracting the burst signature of node v node id N , a time-of-day t and leak being detected at time
t + τ . The X-axis shows the id of nodes (N ). The Y-axis
shows the average time to detect the leak (τ ), averaged across
all times-of-day (t) when the leak can happen at the node.

54
emitter2 emitter14 emitter26 DMAs of the benchmark topology. The figure also shows γ  ,
emitter6 emitter18
emitter10 emitter22 the number of distinguishable burst flows in the DMA as per
the proposed centrality based approach. We find that γ  is quite
100 close to γ for all the DMAs - the maximum difference between
90 the two is 1. This shows that our approach does as well as
a fully calibrated hydraulic model in terms of extracting the
80
number of unique burst signatures, i.e., makes efficient use of
% Detection

70 the static information about the network.


60
We evaluate our localization method on all leak events that
50 resulted in a leak detection. Recall that the leak detection
40 method detects a leak and estimates the excess leak flow.
The leak localization method maps this excess leak flow and
30
the time-of-day to likely candidate sets where the leak has
20 occurred. Since we use the candidate set which has the highest
0 5 10 15 20 25 30 35 40
likelihood for further analysis, we consider two metrics to
Node id
evaluate our approach: (i) Percentage of time the actual leak
Fig. 7. Averaged (across all times-of-day) detection likelihood for each node location is included in the candidate set; and (ii) The size of
emitter2 emitter14 emitter26 the candidate set as a percentage of the DMA size.
emitter6 emitter18
emitter10 emitter22
100
1.75

% time node is in highest prob set


80
1.5
Avg. Time to detect (hr)

1.25 60

1
40
0.75

0.5 20

0.25
0
0 5 10 15 20 25 30 35 40
0 Node id
0 5 10 15 20 25 30 35 40
Node id Fig. 10. Percentage of time the burst node is within the high probability set
Fig. 8. Average (across all times-of-day) detection time for each node
100
Accurate Hydraulic Model
The graphs follow a similar trend as the detection likelihood. 90 Centrality
Size of candidate locations in %

80
Small bursts take a significantly longer time-to-detect. This
70
is because the extra flow due to the leak is low and can be
60
picked up only during lean periods of the regular flow pattern. 50
Larger-sized leaks on the other hand are detected fairly quickly 40
except when the occur at a few nodes which again are at the 30
periphery of the network. 20
10

γ - hydraulic model γ’ - centrality 0


0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
Node id
6 Fig. 11. The size of the high probability set as a percentage of the DMA
5 size
4
3 Figure 10 shows the percentage of times a leak location
2 is included in the candidate set with the highest likelihood.
1 Specifically, the X-axis shows the node id where the burst
0 occurs. The Y-axis shows the average probability that the node
DMA1 DMA2 DMA3 DMA4 DMA5 is in the candidate set with the highest likelihood; the averaging
is over all times-of-day. As can be seen, for bursts occurring at
Fig. 9. γ and γ  for different DMAs in the benchmark topology. Results
show that the proposed localization approach can potentially match a well most nodes, the node is included in the candidate set. Nodes
calibrated hydraulic model. 2,13, and 14 are consistently not included in the candidate set.
This is because their burst flows are under-estimated by our
B. Leak localization - benchmark topology centrality-based model when compared with their actual burst
flows. So, the inverse mapping of the actual burst flows at
Before we study the efficacy of the proposed approach in these nodes points to a different set of nodes.
localizing leaks, we quantify the bounds of localization for
this DMA as discussed in Section IV. Figure 9 shows the We also studied the relationship between a node’s distance
actual number of distinguishable burst flows, γ, across various from the inlet flow meter with the degree of localization

55
possible for a leak at that node. The candidate set size showed a
22 Flow
negative correlation of 0.40 with the nodal distance. Since this 20 Durn. 6
is a weak correlation, we conclude that a node’s distance from

Leak flow(m3/hr)
18

Duration (hrs)
the meter may not have much influence on the localization. 16 5
14 4
Figure 11 shows the size of the candidate set as a percent- 12
age of the DMA size. The lower the percentage, the better 10 3
the localization is. We compare our localization with the best 8
2
possible localization with one meter using a fully calibrated 6
hydraulic model with perfect information. As can be seen, our 4 1
1 2 3 4 5 6
method’s localization is close to the hydraulic model in many
Burst ID
cases.
Fig. 13. Leak durations and flows in DMA-2 as measured from one metered
inlet
C. Comparison with approximate hydraulic models
We compare the performance of our centrality based lo- performs satisfactorily for one DMA need not do so for other
calization method with that of approximate hydraulic models. DMAs. The proposed centrality based approach, on the other
By an approximate hydraulic model, we refer to a model in hand, can be used readily for any DMA and gives a better
which the roughness coefficients μe ’s of the network pipes are average performance than the approximate hydraulic models.
not accurately estimated. In our experiments, we assign the
same μe value to all the pipes in the network and observe VII. E VALUATION ON REAL - WORLD DATA
the localization set for various leak locations. We repeat this We now present the results of our analysis on a real-world
experiment for different μe values and obtain the average DMAs for a water utility in Europe. Specifically, we obtained
localization performance. flow meter data for a DMA. This DMA had one metered
We vary μe from 90 units (corresponding to a new pipe) inlet and 56 nodes. Data was available as the flow-rate over
to 65 (corresponding to an aged pipe), in decreasing steps 15 minute intervals for a period of 21 months. To validate
of 5. The best and average case localization performance of our detection and localization, we also obtained logs from
the approximate model across these μe values is shown in the WMIS (Workforce Management Information System) that
Figure 12 using the cumulative distribution function (cdf). records any reported leaks and repairs.
The cdf F (x) shows the fraction of leak locations whose
candidate set’s cardinality is less than or equal to a given size A. Leak detection in real-world DMA
x. The performance of the centrality method is also shown
Over the 21 month period, this DMA witnessed 6 water
for reference. The average size of the candidate set of the
leak events. Out of the 6 leaks reported in the WMIS, our
approximate method in the best and average cases are 29%
model is able to identify 3. For these 3 leaks, the leak duration
and 58% of the DMA size respectively. The average size of
and excess flow are shown in Figure 13. Analysis of the
the candidate set in the centrality method is 46% of the DMA
flow values for the 3 undetected leaks showed that the flows
size. We also observed that the localization performance of the
during the reported leaks were within the statistical bounds of
approximate model was sensitive to the μe values. The size of
normal operation; hence our leak detection method could not
the candidate set (averaged across all leak locations) varied
detect these leaks. Further, fact-checking with field personnel
significantly from 29% to 83% of the DMA size as the μe
confirmed that these events involved minor links which did
values were varied.
not have significant flows to begin with. These were the leaks
Empirical CDF
1
that happened in smaller pipes that were almost at the farthest
periphery of the DMA. Consequently, these leaks had little
0.8 impact on the flow meter and hence could not be detected.

0.6
Our leak detection method also detected some events which
were not logged in the WMIS. For these events, the inlet flow
F(x)

0.4 meter values showed a reduced value and was hence flagged as
Centrality
abnormal by our leak detection module. Our discussions with
0.2 Approx. Model − Avg field personnel revealed that the reduced values were due to
Approx Model. − Best maintenance activity done at an upstream DMA; hence these
0
0 0.2 0.4 0.6 0.8 1 activities were not logged in the DMA’s WMIS. As our leak
Search Size (as a fraction of network size) detection method is strictly based on statistical modeling of
Fig. 12. The best case performance of approximate models can be better the local flow samples, it can give rise to false positives due
than that of the centrality approach. However, their average performance is to such exogenous factors.
poorer.

B. Leak localization in real-world DMA


While one can attempt to use the approximate model that
gives the smallest candidate set, picking that useful approxi- Of the 3 detected leaks in DMA-2, we were able to localize
mate model(s) for leak localization among the various options 2. The two events were localized to be within 22.5% and
cannot happen until after a long sequence of burst events 28.6% of the DMA’s size (as opposed to only knowing that the
are manually localized. Further, an approximate model that leak happened somewhere within 100% of the DMA’s size).

56
(a) Burst event 1 (b) Burst event 2
Fig. 14. The candidate set (in dots) identified by our approach and the actual burst location overlaid on top of a real world DMA’s map.

The average pipeline distance between the nearest neighbor in Reference [3] proposes a leak localization methodology
the candidate set and the actual burst location is 148.45m for based on Bayesian system identification methodology. This
event-1 and 126.7m for the event-2. The candidate sets and the approach takes into account the errors involved in measurement
actual burst location for these two events are shown in Figure and hydraulic modeling. It also associates an uncertainty with
14. the estimated leak location and magnitude. However, this
approach assumes that the a priori error distribution in the
The third event (with burst ID 4) that our method missed
hydraulic model parameters is known, i.e., in other words,
happened in a minor communication pipe which is pressure
it assumes that the hydraulic model is calibrated and the
reduced (by a valve). Consequently it has too small a burst
calibration errors are known. As we have seen before, not all
flow magnitude to be precisely localized.
utilities can afford to obtain this information.
VIII. R ELATED W ORK
Researchers have proposed approaches to automatically
Outlier detection: There is an appreciable body of literature track and deploy sensor nodes in pipeline networks [21], [22].
on outlier detection in wireless sensor networks [10], [11], These can be leveraged to deploy sensors in water distribution
[12], [13], [14]. The objective of these works is to detect networks to increase the leak localization resolution. In this
global outliers based on the data collected by all the sensors paper, we take cognizance of the fact that real world water
in a wireless sensor network. These works propose different utilities are budget constrained and hence may not have re-
in-network processing techniques based on local information sources to deploy additional sensors – the only sensors that
to conserve network bandwidth and energy in the sensor are available are those present at the DMA inlets.
nodes. Considerable work has also been done in the context
Centrality analysis of networks: Centrality measures have
of centralized and distributed anomaly detection, – [15], [16],
been widely used to analyze social networks. They are often
[17] and the references therein. These are more suited for
used to model information or commodity flow [23], find
off-line data mining. In this paper, we are interested in on-
influential actors [24], and to study the spread of epidemics
line detection of outliers in a time series emanating from a
[25]. More recently, centrality measures have also been used
single sensor node. Therefore, very little in-network distributed
to study the uncertainty in social networks among others [26].
processing and inter-node communication are required. For
simplicity, in this work, we use a standard ARIMA model Centrality measures have also been used to study infras-
based deviation approach for detecting outliers. We note that tructure networks too. Reference [27] analyzes the topological
other outlier detection techniques for time series [7] can also structures of power grids and shows that grid topologies are
be used. different from that of random graphs and small world net-
works. It also proposed an electrical centrality measure based
Leak localization in water utility networks: Leak localiza-
on the impedance matrix and used this centrality measure to
tion in water networks has been studied in various works with
explain why a few number of highly connected bus failures can
each assuming the availability of certain data. Reference [2]
cause cascading effects in power grids which was refined in
proposes a wavelet based analysis to detect and localize bursts
[28]. Reference [29] uses centrality analysis along with other
using data from pressure sensors. This method assumes a dense
tools to characterize the impact of hurricanes on the reliability
deployment of sensors that sample the pressure signal at a high
of electrical grids.
rate. Localization is achieved based on the arrival time of the
pressure transient resulting from a pipe burst event. However, In the context of water networks, reference [30] analyzes
not all real world water utility networks are instrumented to the the vulnerability of a water network to get disconnected based
extent required by this approach which limits its applicability. on the connectivity information. Centrality metrics have also
been used to identify strategic sensor placement locations for
There are also other approaches [18], [19], [20] that local-
detecting contamination in a water network [31].
ize leaks based on the pressure data sampled at a high rate.
These have either been validated on single pipes [19], [20] To the best of our knowledge, the issue of localizing major
or assume that the network demand does not vary much with leaks in water distribution networks using only one flow sensor
time [18]. Clearly, these approaches are not applicable in the in the absence of any hydraulic model has not been addressed
context of a real life water distribution network. before. Our work complements the existing body of knowledge

57
under both of the above mentioned areas by showing that [11] J. W. Branch, C. Giannella, B. Szymanski, R. Wolff, and H. Kargupta,
through appropriate customization, centrality analysis can help “In-network outlier detection in wireless sensor networks,” Knowledge
localize leaks in water distribution networks. and Information Systems, vol. 34, no. 1, pp. 23–54, Jan. 2013.
[12] Y. Zhang, N. Meratnia, and P. Havinga, “Outlier detection techniques
for wireless sensor networks: A survey,” IEEE Communications Surveys
IX. C ONCLUSIONS & Tutorials, vol. 12, no. 2, pp. 159–170, May 2010.
We considered the problem of leak localization using data [13] S. Subramaniam, T. Palpanas, D. Papadopoulos, V. Kalogeraki, and
from a single flow meter placed at the inlet of a DMA in water D. Gunopulos, “Online outlier detection in sensor data using non-
parametric models,” in Proceedings of ACM Conference on Very Large
utility networks. We used time-series based modeling to detect Databases, ser. VLDB. New York, NY, USA: ACM, 2006, pp. 187–
if a current meter reading is a leak or not, and if so, to estimate 198.
the excess flow. Conventional leak localization approaches use [14] Y. Zhuang, L. Chen, X. Wang, and J. Lian, “A weighted average-based
an expensive hydraulic model to map the excess flow back approach for cleaning sensor data,” in Proceedings of the 27th Inter-
to a candidate leak location. However, obtaining an accurate national Conference on Distributed Computing Systems, ser. ICDCS,
hydraulic model is expensive and hence, beyond the reach of 2007.
many water utilities. We presented an alternate approach that [15] V. Chandola, A. Banerjee, and V. Kumar, “Anomaly detection: A
survey,” ACM Comput. Surv., vol. 41, no. 3, pp. 15:1–15:58, Jul. 2009.
exploits the network structure and static properties in a novel
way. We extended the use of centrality metrics to infrastructure [16] W. R., B. K., and H. Kargupta, “A generic local algorithm for mining
data streams in large distributed systems,” IEEE Transactions on
domains and used them to map the candidate leak location(s) Knowledge and Data Engineering, vol. 21, no. 4, 2009.
from the excess leak flow. We evaluated our approach on [17] M. Otey, A. Ghoting, and P. S., “Fast distributed outlier detection in
benchmark water utility network topologies as well as on real mixed attribute data sets,” Data Mining and Knowledge Discovery,
data obtained from an European water utility. On benchmark vol. 12, pp. 203–228, 2006.
topologies, the performance of our method was found to be [18] D. Misiunas, “Failure monitoring and asset condition assessment in
comparable to solutions obtained from expensive hydraulic water supply systems,” Ph.D. dissertation, Lund University, 2005.
models. On a real world network, we were able to localize [19] D. Misiunas, J. Vitkovsky, G. Olsson, A. Simpson, and M. Lambert,
two out of the three leaks whose data we had access to. On “Pipeline burst detection and location using a continuous monitoring
technique.” in Advances in Water Supply Management: International
these two cases, we find that the actual leak location was in the Conference on Computing and Control for the Water Industry (CCWI),
candidate set identified by our approach; further, our approach 2003, pp. 89–96.
pruned as much as 78% of the DMA locations, indicating [20] R. Silva, C. Buiatti, S. Cruz, and J. Pereira, “Pressure wave behavior
a high degree of localization. Presently, we are working to and leak detection in pipelines,” Computers and Chemical Engineering,
extend this approach to cases where (a) the flow information vol. 20, no. S1, pp. S491–S496, 1996.
is supplemented by pressure data, and (b) additional meters [21] T.-t. T. Lai, Y.-h. T. Chen, P. Huang, and H.-h. Chu, “Pipeprobe: A
are sparsely deployed inside the DMA. mobile sensor droplet for mapping hidden pipeline,” in Proceedings of
the 8th ACM Conference on Embedded Networked Sensor Systems, ser.
SenSys ’10. New York, NY, USA: ACM, 2010, pp. 113–126.
ACKNOWLEDGMENT
[22] T.-T. Lai, W.-J. Chen, K.-H. Li, P. Huang, and H.-H. Chu, “Triopusnet:
We sincerely thank the anonymous reviewers and our Automating wireless sensor network deployment and replacement in
shepherd Dr. Niki Trigoni. Their valuable suggestions have pipeline monitoring,” in Proceedings of the 11th International Confer-
ence on Information Processing in Sensor Networks, ser. IPSN ’12.
significantly improved the paper. New York, NY, USA: ACM, 2012, pp. 61–72.
[23] S. P. Borgatti, “Centrality and network flow,” Social Networks, vol. 27,
R EFERENCES no. 1, pp. 55 – 71, 2005.
[1] World Bank, “The Challenge of Reducing Non-Revenue Water in [24] E. Costenbader, “The stability of centrality measures when networks
Developing Countries,” 2006. are sampled,” Social Networks, vol. 25, no. 4, Oct. 2003.
[2] S. Srirangarajan, M. Allen, A. Preis, M. Iqbal, H. B. Lim, and [25] H. Habiba, Y. Yu, T. Y. Berger-Wolf, and J. Saia, “Finding spread
A. Whittle, “Wavelet-based burst event detection and localization in blockers in dynamic networks,” in 2nd Intl. conf. on Advances in social
water distribution systems,” Signal Processing Systems, vol. 72, no. 1, network mining and analysis, ser. SNAKDD’08, 2010.
pp. 1–16, 2013. [26] C. Correa, T. Crnovrsanin, K.-L. Ma, and K. Keeton, “The derivatives
[3] Z. Poulakis, D. Valougeorgis, and C. Papadimitriou, “Leakage detection of centrality and their applications in visualizing social networks,” IEEE
in water pipe networks using a bayesian probabilistic framework,” Tran. on Visualization and Computer Graphics, 2011.
Probabilistic Engineering Mechanics, vol. 18, pp. 315–327, 2003. [27] P. Hines and S. Blumsack, “A centrality measure for electrical net-
[4] Prabhata Swamee and Ashok Sharma, Design of Water Supply Pipe works,” in Proceedings of Hawaii International Conference on System
Networks. Wiley-Interscience, 2008. Sciencesl, 2008.
[5] C. M. Bros and P. Kalungi, “Calibration and sensitivity analysis of the [28] Z. Wang, A. Scaglione, and R. Thomas, “Electrical centrality measures
c-town pipe network model,” ASCE Water Distribution Systems Analysis for electric power grid vulnerability analysis,” in 49th IEEE Conference
2010, pp. 1507–1523. on Decision and Control (CDC), 2010, 2010.
[6] Authors’ discussions with water utility companies, 2012. [29] J. Winkler, L. Dueas-Osorio, R. Stein, and D. Subramanian, “Perfor-
[7] V. Barnett and T. Lewis, Outliers in Statistical Data. Wiley Series in mance assessment of topologically diverse power systems subjected to
Probability & Statistics, 1994. hurricane events,” Reliability Engineering and System Safety, vol. 95,
no. 4, pp. 323–336, 2010.
[8] U. Brandes and D. Fleischer, “Centrality measures based on current
flow,” in 22nd conference on Theoretical Aspects of Computer Science, [30] A. Yazdani and P. Jeffrey, “A complex network approach to robustness
ser. STACS, 2005. and vulnerability of spatially organized water distribution networks,”
CoRR, vol. abs/1008.1770, 2010.
[9] A. Ostfeld et. al, “Battle of the water calibration networks,” Journal of
Water Resources Planning and Management, vol. 138, no. 5, 2012. [31] J. Xu, P. Fischbeck, M. Small, J. VanBriesen, and E. Casman, “Identify-
[10] B. Sheng, Q. Li, W. Mao, and W. Jin, “Outlier detection in sensor ing sets of key nodes for placing sensors in dynamic water distribution
networks,” in Proceedings of the 8th ACM International Symposium on networks,” Journal of Water Resources Planning and Management, vol.
Mobile and Ad Hoc Networking and Computing, ser. MobiHoc. New 134, no. 4, pp. 378–385, 2008.
York, NY, USA: ACM, 2007, pp. 219–228.

58

Das könnte Ihnen auch gefallen