Sie sind auf Seite 1von 15

1

Information-Theoretic Privacy for Smart Metering


Systems with a Rechargeable Battery
Simon Li, Ashish Khisti, and Aditya Mahajan

AbstractSmart-metering systems report electricity usage of a the users demandthe rest being supplied by the battery; or
user to the utility provider on almost real-time basis. This could may be more than the users demandthe excess being stored
leak private information about the user to the utility provider. in the battery. A rechargeable battery provides privacy because
In this work we investigate the use of a rechargeable battery in
order to provide privacy to the user. the power consumed from the grid (rather than the users
We assume that the user load sequence is a first-order Markov demand) gets reported to the electricity utility (and potentially
process, the battery satisfies ideal charge conservation, and observed by an eavesdropper). In this paper, we focus on
that privacy is measured using normalized mutual information the mutual information between the users demand and con-
(leakage rate) between the user load and the battery output. sumption (i.e., the information leakage) as the privacy metric.
We consider battery charging policies in this setup that satisfy
the feasibility constraints. We propose a series reductions on the Mutual Information is a widely used metric in the literature
original problem and ultimately recast it as a Markov Decision on information theoretic security, as it is often analytically
Process (MDP) that can be solved using a dynamic program. tractable and provides a fundamental bound on the probability
In the special case of i.i.d. demand, we explicitly characterize of detecting the true load sequence from the observation [7].
the optimal policy and show that the associated leakage rate can Our objective is to identify a battery management policy
be expressed as a single-letter mutual information expression. In
this case we show that the optimal charging policy admits an (which determine how much energy to store or discharge from
intuitive interpretation of preserving a certain invariance prop- the battery) to minimize the information leakage rate.
erty of the state. Interestingly an alternative proof of optimality We briefly review the relevant literature. The use of a
can be provided that does not rely on the MDP approach, but is rechargeable battery for providing user privacy has been
based on purely information theoretic reductions. studied in several recent works, e.g., [6], [8][11]. Most of the
existing literature has focused on evaluating the information
I. I NTRODUCTION leakage rate of specific battery management policies. These
include the best-effort policy [6], which tries to maintain a
Smart meters are a critical part of modern power distribution
constant consumption level, whenever possible; and battery
systems because they provide fine-grained power consumption
conditioned stochastic charging policies [8], in which the
measurements to utility providers. These fine-grained measure-
conditional distribution on the current consumption depends
ments improve the efficiency of the power grid by enabling
only on the current battery state (or on the current battery state
services such as time-of-use pricing and demand response [1].
and the current demand). In [6], the information leakage rate
However, this promise of improved efficiency is accompanied
was estimated using Monte-Carlo simulations; in [8], it was
by a risk of privacy loss. It is possible for the utility provider
calculated using the BCJR algorithm [12]. The methodology
or an eavesdropperto infer private information including
of [8] was extended by [9] to include models with energy
load taxonomy from the fine-grained measurements provided
harvesting and allowing for a certain amount of energy waste.
by smart meters [2][4]. Such private information could be
Bounds on the performance of the best-effort policy and hide-
exploited by third parties for the purpose of targeted adver-
and-store policy for models with energy harvesting and infinite
tisement or surveillance. Traditional techniques in which an
battery capacity were obtained in [10]. The performance of
intermediary anonymizes the data [5] are also prone privacy
the best effort algorithm for an individual privacy metric was
loss to an eavesdropper. One possible solution is to partially
considered in [11]. None of these papers address the question
obscure the load profile by using a rechargeable battery [6]. As
of choosing the optimal battery management policy.
the cost of rechargeable batteries decreases (for example, due
Rate-distortion type approaches have also been used to study
to proliferation of electric vehicles), using them for improving
privacy-utility trade-off [13][15]. These models allow the user
privacy is becoming economically viable.
to report a distorted version of the load to the utility provider,
In a smart metering system with a rechargeable battery, the
subject to a certain average distortion constraint. Our setup
energy consumed from the power grid may either be less than
differs from these works as we impose a constraint on the
Simon Li was with the University of Toronto when this work was done. instantaneous energy stored in the battery due to its limited
Ashish Khisti (akhisti@ece.utoronto.ca) is with the University of Toronto, capacity. Both our techniques and the qualitative nature of the
Toronto, ON, Canada. Aditya Mahajan (aditya.mahajan@mcgill.ca) is with
McGill University, Montreal, QC, Canada. results are different from these papers.
Preliminary version of this work was presented at the Signal Processing for Our contributions are two-fold. First, when the demand is
Advanced Wireless Communications (SPAWC), Jul. 2015, Stockholm, Swe- Markov, we show that the minimum information leakage rate
den, International Zurich Seminar (IZS), Mar. 2-4, 2016, Zurich, Switzerland
and the American Control Conference (ACC), July 6-8, 2016, Boston, MA, and optimal battery management policies can be obtained by
USA. solving an appropriate dynamic program. These results are
2

similar in spirit to the dynamic programs obtained to compute


Home Demand: Smart Meter Consumption: Power
capacity of channels with memory [16][18]; however, the
Applicances Controller Grid
specific details are different due to the constraint on the battery
state. Second, when the demand is i.i.d., we obtain a single

letter characterization of the minimum information leakage
rate; this expression also gives the optimal battery management
Battery Eavesdropper/
policy. We prove the single letter expression in two steps. On (State ) Adversory
the achievability side we propose a class of policies with a
specific structure that enables a considerable simplification of
the leakage-rate expression. We find a policy that minimizes Fig. 1: A smart metering system.
the leakage-rate within this restricted class. On the converse
side, we obtain lower bounds on the minimal leakage rate and
show that these lower bound match the performance of the best The demand {Xt }t1 is a first-order time-homogeneous
structured policy. We provide two proofs. One is based on the Markov chain1 with transition probability Q. We assume
dynamic program and the other is based purely on information that Q is irreducible and aperiodic. The initial state X1 is
theoretic arguments. distributed according to probability mass function PX1 . The
After the present work was completed, we became aware initial charge S1 of the battery is independent of {Xt }t1 and
of [21], where a similar dynamic programming framework is distributed according to probability mass function PS1 .
presented for the infinite horizon case. However, no explicit The battery is assumed to be ideal and has no conversion
solutions of the dynamic program are derived in [21]. To the losses or other inefficiencies. Therefore, the following conser-
best of our knowledge, the present paper is the first work that vation equation must be satisfied at all times:
provides an explicit characterization of the optimal leakage St+1 = St + Yt Xt . (1)
rate and the associated policy for i.i.d. demand.
Given the history of demand, battery charge, and consump-
tion, a randomized battery charging policy q = (q1 , q2 , . . . )
A. Notation determines the energy consumed from the grid. In particular,
Random variables are denoted by uppercase letters (X, Y , given the histories (xt , st , y t1 ) of demand, battery charge,
etc.), their realization by corresponding lowercase letters (x, and consumption at time t, the probability that current con-
y, etc.), and their state space by corresponding script letters sumption Yt equals y is qt (y | xt , st , y t1 ). For a randomized
(X , Y, etc.). PX denotes the space of probability distributions charging policy to be feasible, it must satisfy the conservation
on X ; PX|Y denotes the space of stochastic kernels from Y to equation (1). So, given the current power demand and battery
X . Xab is a short hand for (Xa , Xa+1 , . . . , Xb ) and X b = X1b . charge (xt , st ), the feasible values of grid consumption are
For a set A, 1A (x) denotes the indicator function of the set defined by
that equals 1 if x A and zero otherwise. If A is a singleton
set {a}, we use 1a (x) instead of 1{a} (x). Y (st xt ) = {y Y : st xt + y S}. (2)
Given random variables (X, Y ) with joint distribution Thus, we require that
PX,Y (x, y) = PX (x)q(y|x), H(X) and H(PX ) denote the X
entropy of X, H(Y |X) and H(q|PX ) denote conditional qt (Y (st xt ) | xt , st , y t1 ) := qt (y | xt , st , y t1 ) = 1.
entropy of Y given X and I(X; Y ) and I(q; PX ) denote the yY (st xt )
mutual information between X and Y .
The set of all such feasible policies is denoted by QA 2 . Note
that while the charging policy qt () can be a function of the
II. P ROBLEM F ORMULATION AND M AIN R ESULTS entire history, the support of qt () only depends on the present
A. Model and problem formulation value of xt and st through the difference st xt . This is
emphasized in the definition in (2).
Consider a smart metering system as shown in Fig. 1. At The quality of a charging policy depends on the amount
each time, the energy consumed from the power grid must of information leaked under that policy. There are different
equal the users demand plus the additional energy that is notions of privacy; in this paper, we use mutual information
either stored in or drawn from the battery. Let {Xt }t1 , as a measure of privacy. Intuitively speaking, given random
Xt X , denote the users demand; {Yt }t1 , Yt Y, denote variables (Y, Z), the mutual information I(Y ; Z) measures
the energy drawn from the grid; and {St }t1 , St S, the decrease in the uncertainty about Y given by Z (or vice-
denote the energy stored in the battery. All alphabets are versa). Therefore, given a policy q, the information about
finite. For convenience, we assume X := {0, 1, . . . , mx }, (X T , S1 ) leaked to the utility provider or eavesdropper is
Y := {0, 1, . . . , my }, and S = {0, 1, . . . , ms }. We note that
such a restriction is for simplicity of presentation; the results 1 In practice, the energy demand is periodic rather than time homogeneous.

generalize even when X and Y are not necessarily contiguous We are assuming that the total consumption may be split into a periodic
intervals or integer valued. To guarantee that users demand is predictable component and a time-homogeneous stochastic component. In this
paper, we ignore the predictable component because it does not affect privacy.
always satisfied, we assume mx my or that X Y holds 2 With a slight abuse of notation, we use Q to denote the battery policy
A
more generally. for both the infinite and finite-horizon problems
3

captured by I q (X T , S1 ; Y T ), where the mutual information = 0/ = 0 = 0/ = 0


is evaluated according to the joint probability distribution on = 0/ = 1
(X T , S T , Y T ) induced by the distribution q as follows:

Pq (S T = sT , X T = xT , Y T = y T ) =0 =1
T 
Y
= PS1 (s1 )PX1 (x1 )q1 (y1 | x1 , s1 ) 1st {st1 xt1 +yt1 } = 1/ = 0
t=2
 = 1/ = 1 = 1/ = 1
t t t1
Q(xt |xt1 )qt (yt | x , s , y ) . Fig. 2: Binary System model. The battery can be either in
s = 0 or s = 1. The set of feasible transitions from each state
We use information leakage rate as a measure of the are shown in the figure.
quality of a charging policy. For a finite planning horizon, the
information leakage rate of a policy q = (q1 , . . . , qT ) QA
is given by assume that the demand (input) sequence is sampled i.i.d. from
an equiprobable distribution.
1 q T
LT (q) := I (X , S1 ; Y T ), (3) Consider a simple policy that sets yt = xt and ignores the
T battery state. It is clear that such a policy will lead to maximum
while for an infinite horizon, the worst-case information leak- leakage LT = 1. Another feasible policy is to set yt = st .
age rate of a policy q = (q1 , q2 , . . . ) QA is given by Thus whenever st = 0, we will set yt = 1 regardless of the
value of xt , and likewise st = 1 will result in yt = 0. It turns
L (q) := lim sup LT (q). (4)
T out that the leakage rate for this policy also approaches 1. To
see this note that the eavesdropper having access to y T also
We are interested in the following optimization problems:
in turn knows sT . Using the battery update equation (1) the
Problem A. Given the alphabet X of the demand, the initial sequence xT1 1 is thus revealed to the eavesdropper, resulting
distribution PX1 and the transistion matrix Q of the demand in a leakage rate of at least 1 1/T .
process, the alphabet S of the battery, the initial distribution In reference [8] a probabilistic battery charging policy is
PS1 of the battery state, and the alphabet Y of the consump- introduced that only depends on the current state and input i.e.,

tion: qt (yt |xt , st ) = qt (yt |xt , st ). Furthermore the policy makes
1) For a finite planning horizon T , find a battery charging equiprobable decisions between the feasible transitions i.e.,
policy q = (q1 , . . . , qT ) QA that minimizes the leakage qt (yt = 0|xt , st ) = q(yt = 1|xt , st ) = 1/2, xt = st (5)
rate LT (q) given by (3).
2) For an infinite planning horizon, find a battery charging and qt (yt |xt , st ) = 1xt (yt ) otherwise. The leakage rage for
policy q = (q1 , q2 , . . . ) QA that minimizes the leakage this policy was numerically evaluated in [8] using the BCJR
rate L (q) given by (4). algorithm and it was shown numerically that L = 0.5. Such
numerical techniques seem necessary in general even for the
The above optimization problem is difficult because we class of memoryless policies and i.i.d. inputs, as the presence
have to optimize a multi-letter mutual information expression of the battery adds memory into the system.
over the class of history dependent probability distributions. As a consequence of our main result it follows that the
In the spirit of results for feedback capacity of channels above policy admits a single-letter expression for the leakage
with memory [16][18], we show that the above optimization rate3 L = I(S X; X), thus circumventing the need for
problem can be reformulated as a Markov decision process numerical techniques. Furthermore it also follows that this
where the state and action spaces are conditional probability leakage rate is indeed the minimum possible one among the
distributions. Thus, the optimal policy and the optimal leakage class of all feasible policies. Thus it is not necessary for the
rate can be computed by solving an appropriate dynamic battery system to use more complex policies that take into
program. We then provide an explicit solution of the dynamic account the entire history. We note that a similar result was
program for the case of i.i.d. demand. shown in [30] for the case of finite horizon policies. However
the proof in [30] is specific to the binary model. In the present
B. Example: Binary Model paper we provide a complete single-letter solution to the case
of general i.i.d. demand, and a dynamic programming method
We illustrate the special case when X = Y = S = {0, 1} for the case of first-order Markovian demands, as discussed
in Fig. 2. The input, output, as well as the state, are all next.
binary valued. When the battery is in state st = 0, there
are three possible transitions. If the input xt = 1 then we
C. Main results for Markovian demand
must select yt = 1 and the state changes to st+1 = 0. If
instead xt = 0, then there are two possibilities. We can select We identify two structional simplifications for the battery
yt = 0 and have st+1 = 0 or we can select yt = 1 and charging policies. First, we show (see Proposition 1 in Sec-
have st+1 = 1. In a similar fashion there are three possible 3 The random variable S is an equiprobable binary valued random variable,
transitions from the state st = 1 as shown in Fig. 2. We will independent of X.
4

tion III-A) that there is no loss of optimality in restricting Moreover, the optimal (finite horizon) leakage rate is
attention to charging strategies of the form given by V1 (1 )/T , where 1 (x, s) = PX1 (x)PS1 (s).
2) For the infinite horizon, the optimal charging policy of
qt (yt |xt , st , y t1 ). (6) the form (8) is time-homogeneous and is given by the
The intuition is that under such a policy, observing y t gives following fixed point equation
partial information only about (xt , st ) rather than about the J + v() = min[Ba v](), PX,S , (12)
whole history (xt , st ). aA

Next, we identify a sufficient statistic for y t1 is the where J R is a constant and v : PX,S R. Let f ()
charging strategies of the form (6). For that matter, given a denote the arg min of the right hand side of (12). Then,
policy q and any realization y t1 of Y t1 , define the belief the time-homogenous optimal policy q = (q , q , . . . )
state t PX,S as follows: given by
t (x, s) = Pq (Xt = x, St = s|Y t1 = y t1 ) (7) q (yt | xt , st , t ) = at (yt | xt , st ), where at = f (t )
Then, we show (see Theorem 1 below) that there is no loss is optimal. Moreover, the optimal (infinite horizon) leak-
of optimality in restricting attention to charging strategies of age rate is given by J.
the form
See Section III for proof.
qt (yt |xt , st , t ). (8) The dynamic program above resembles the dynamic pro-
Such a charging policy is Markovian in the belief state t gram for partially observable Markov decision processes
and the optimal policies of such form can be searched using (POMDP) with hidden state (Xt , St ), observation Yt , and
a dynamic program. action At . However, in contrast to POMDPs, the expected
To describe such a dynamic program, we assume that there per-step cost I(a; ) is not linear in . Nonetheless, one could
is a decision maker that observes y t1 (or equivalently t ) and use computational techniques from POMDPs to approximately
chooses actions at = qt (|, , t ) using some decision rule solve the dynamic programs of Theorem 1. See Section III for
at = ft (t ). We then identify a dynamic program to choose a brief discussion.
the optimal decision rules.
Note that the actions at take vales in a subset A of PY |X,S D. Main result for i.i.d. demand
given by Assume the following:
 (A) The demand {Xt }t1 is i.i.d. with probability distribu-
A = a PY |X,S : a(Y (s x) | x, s) = 1,
tion PX .
(x, s) X S . (9)
We provide an explicit characterization of optimal policy and
To succiently write the dynamic program, for any a A, we optimal leakage rate for this case.
define the Bellman operator Ba : [PX,S R] [PX,S R] Define an auxiliary state variable Wt = St Xt that takes
as follows: for all PX,S , values in W = {s x : s S, x X }. For any w W,
define:
[Ba V ]() = I(a; )
X D(w) = {(x, s) X S : s x = w}. (13)
+ (x, s)a(y | x, s)V ((, y, a)) (10)
xX ,sS,
Then, we have the following.
yY
Theorem 2. Define
where the function is a non-linear filtering equation defined
in Sec. III-C. J = min I(S X; X) (14)
PS
Our main result is the following:
where X and S are independent with X PX and S

Theorem 1. In Problem A there is no loss of optimality to .
P Let denote the arg min in (14). Define (w) =
restrict attention to charging policies of the form (8).
P
(x,s)D(w) X (x) (s). Then, under (A)

1) For the finite horizon T , we can identify the optimal 1) J is the optimal (infinite horizon) leakage rate.
policy q = (q1 , . . . , qT ) of the form (8) by iteratively 2) Define b PY |W as follows:
defining value functions Vt : PX,S R as follows. For (

any PX,S , VT +1 () = 0, and for t {T 1, . . . , 1}, PX (y) (y+w)


(w) if y X Y (w)
b (y|w) = (15)
0 otherwise .
Vt () = min[Ba Vt+1 ](), PX,S . (11)
aA Then, the memoryless charging policy q = (q1 , q2 , . . . )
given by
Let ft () denote the arg min of the right hand side
of (11). Then, optimal policy q = (q1 , . . . , qT ) is given qt (y | xt , st , t ) = b (y | st xt ) (16)
by
is optimal and achieves the optimal (infinite horizon)
qt (yt | xt , st , t ) = at (yt | xt , st ), where at = ft (t ). leakage rate.
5

Note that the optimal charging policy is memoryless, i.e., where the last equation uses (17).
the distribution on Yt does not depend on t (and, therefore Thus, if 6= , we can strictly decrease the mutual
on y t1 ). information by using . Hence, the optimal distribution must
The proof, which is presented in Section IV, is based have the property that (s) = (ms s).
on the standard arguments of showing achievability and a
Given a distribution on some alphabet M, we say that the
converse. On the achievability side we show that the policy
distribution is almost symmetric and unimodal around m
in (15) belongs to a class of policies that satisfies a certain
M if
invariance property. Using this property the multi-letter mutual
information expression can be reduced into a single-letter m m +1 m 1 m +2 m 2 . . .
expression. For the converse we provide two proofs. The
first is based on a simplification of the dynamic program of where we use the interpretation that for m 6 M, m = 0.
Theorem 1. The second is based on purely probabilistic and Similarly, we say that the distribution is symmetric and uni-
information theoretic arguments. modal around m M if
m m +1 = m 1 m +2 = m 2 . . .
E. Salient features of the result for i.i.d. demand
Theorem 2 shows that even if consumption could take Note that a distribution can be symmetric and unimodal only
larger values than the demand, i.e., Y X , under policy Yt if its support is odd.
takes values only in X . This agrees with the intuition that a Property 5. If the power demand is symmetric and unimodal
consumption larger that mx reveals that the battery has low around bmx /2c, then the optimal in Theorem 2 is almost
charge and that the power demand is high. In extreme cases, symmetric and unimodal with around bms /2c. In particular,
a large consumption may completely reveal the battery and if ms is even, then
power usage thereby increasing the information leakage.

We now show some other properties of the optimal policy. m m +1 = m 1 m +2 = m 2 . . .

Property 1. The mutual information I(S X; X) is equal to and if ms is odd then


H(S X) H(S).
m = m +1 m 1 = m +2 m 2 = . . .

Proof. This follows from the following simplifications:


where m = bms /2c.
I(S X; X) = H(S X) H(S X|X)
Proof. Let X = X. Then, I(S X; S) = H(S X)
= H(S X) H(S|X)
H(S) = H(S + X) H(S). Note that PX is also symmetric
= H(S X) H(S) and unimodal around bmx /2c.
Property 2. The mutual information I(S X; X) is strictly Let S and X denote the random variables S bms /2c
convex in the distribution and, therefore, int(PS ). and X bmx /2c. Then X is also symmetric and unimodal
around origin and
See Appendix A for proof.
As a consequence, the optimal in (14) may be obtained I(S X; X) = H(S + X ) H(S ).
using the Blahut-Arimoto algorithm [19], [20].
Now given any distribution of S , let + be a permu-
Property 3. Under the battery charging policy specified in tation of that is almost symmetric and unimodal with a
Theorem 2, the power consumption {Yt }t1 is i.i.d. with positive bias around origin. Then by [22, Corollary III.2],
marginal distribution PX . Thus, {Yt }t1 is statistically indis- H(PX ) H(PX + ). Thus, the optimal distribution
tinguishable from {Xt }t1 . must have the property that = + or, equivalently, is
almost unimodal and symmetric around bms /2c.
See Remarks 1 and 3 in Section IV-B for proof.
Combining this with the result of property 4 gives the char-
Property 4. If the power demand has a symmetric PMF, i.e., acterization of the distribution when ms is even or odd.
for any x X , PX (x) = PX (mx x), then the optimal
in Theorem 2 is also symmetric, i.e., for any s S, (s) = F. Numerical Example: i.i.d. demand
(ms s).
Suppose there are n identical devices in the house and each
Proof. For PS , define (s) = (ms s). Let X PX , is on with probability p. Thus, X Binomial(n, p). We derive
S and S . Then, by symmetry the optimal policy and optimal leakage rate for this scenario
I(S X; X) = I(S X; X). (17) under the assumption that Y = X . We consider two specific
examples, where we numerically solve (14).
For any (0, 1), let (s) = (s) + (1 )(s) denote the Suppose n = 6 and p = 0.5.
convex combination of and . Let S . By Property 2,
1) Consider S = [0:5]. Then, by numerically solving (14),
if 6= , then
we get that the optimal leakage rate J is is 0.4616 and
I(S X; X) < I(S X; X) + (1 )I(S X; X) the optimal battery charge distribution is
= I(S X; X), {0.1032, 0.1747, 0.2221, 0.2221, 0.1747, 0.1032}.
6

10
1
Leakage Rates for Binomial Distributed Power Demand
with probability qt (y | xt , st , y t1 ). Then the joint distribution
Jeq for mx = 5 on (X T , S T , Y T ) induced by q QB is given by
J* for mx = 5
Jeq for mx = 10 Pq (S T = sT , X T = xT , Y T = y T )
*
J for mx = 10 T 
Y
0
10 Jeq for mx = 20 = PS1 (s1 )PX1 (x1 )q1 (y1 | x1 , s1 ) 1st {st1 xt1 +yt1 }
Leakage Rate

J* for mx = 20 t=2

t1
Q(xt |xt1 )qt (yt | xt , st , y ) .
1
10 Proposition 1. In Problem A, there is no loss of optimality in
restricting attention to charging policies in QB . Moreover, for
any q QB , the objective function takes an additive form:
T
1X q
10
2
LT (q) = I (Xt , St ; Yt | Y t1 )
0 5 10 15 20 25 30 35 40 45 50 T t=1
Battery Size

Fig. 3: A comparison of the performance of qeq QB as defined where


in (18) with the optimal leakage rate for i.i.d. Binomial distributed
demand Binomial(mx , 0.5) for mx = {5, 10, 20}. I q (Xt , St ; Yt | Y t1 )
X
= Pq (Xt = xt , St = st , Y t = y t )
xt X ,st S qt (yt | xt , st , y t1 )
2) Consider S = [0:6]. Then, by numerically solving (14), y t Y t log q .
P (Yt = yt | Y t1 = y t1 )
we get that the optimal leakage rate J is is 0.3774 and
the optimal battery charge distribution is See Appendix B for proof. The intuition behind why poli-
cies in QB are better than those in QA is as follows. For
{0.0773, 0.1364, 0.1847, 0.2031, 0.1847, 0.1364, 0.0773}. a policy QA , observing the realization y t of Y t gives partial
information about the history (xt , st ) while for a policy QB , yt
Note that both results are consistent with Properties 4 and 5. gives partial information only about the current state (xt , st ).
We next compare the performance with the following time- The dependence on (xt , st ) cannot be removed because of the
homogeneous benchmark policy qeq QB : for all y Y, w conservation constraint (1).
W, Proposition 1 shows that the total cost may be written in an
1Y (wt ) {yt } additive form. Next we use an approach inspired by [16][18]
qt (yt |wt ) = . (18)
|Y (wt )| and formulate an equivalent sequential optimization problem.
This benchmark policy chooses all feasible values of Yt with
equal probability. For that reason we call it equi-probably B. An equivalent sequential optimization problem
policy and denote its performance by Jeq . Consider a system with state process {Xt , St }t1 where
In Fig. 3, we compare the performance of q and qeq as a {Xt }t1 is an exogenous Markov process as before and
function of battery sizes for different demand alphabets. {St }t1 is a controlled Markov process as specified below.
Under qeq , the MDP converges to a belief state that is At time t, a decision maker observes Y t1 and chooses a
approximately uniform. Hence, in the low battery size regime, distribution valued action At A, where A is given by (9),
Jeq is close to optimal but its performance gradually worsens as follows:
with increasing battery size. At = ft (Y t1 ) (19)
where f = (f1 , f2 , . . . ) is called the decision policy.
III. P ROOF OF T HEOREM 1 Based on this action, an auxiliary variable Yt Y is chosen
One of the difficulties in obtaining a dynamic programming according to the conditional probability at ( | xt , st ) and the
decomposition for Problem A is that the objective function state St+1 evolves according to (1).
is not of the form
PT
cost(statet , actiont ). We show that
At each stage, the system incurs a per-step cost given by
t=1
there is no loss of optimality to restrict attention to a class of at (yt | xt , st )
policies QB and for any policy in QB , the mutual information ct (xt , st , at , y t ; f ) := log . (20)
Pf (Yt = yt | Y t1 = y t1 )
may be written in an additive form.
The objective is to choose a policy f = (f1 , . . . , fT ) to
minimize the total finite horizon cost given by
A. Simplification of optimal charging policies " T #
1 f X t
Let QB QA denote the set of charging policies that LT (f ) := E ct (xt , st , at , y ; f ) (21)
T t=1
choose consumption based only on the consumption history,
current demand, and battery state. Thus, for q QB , at any where the expectation is evaluated with respect to the proba-
time t, given history (xt , st , y t1 ), the consumption Yt is y bility distribution Pf induced by the decision policy f .
7

Proposition 2. The sequential decision problem described For that matter, for any realization y t1 of past observations
above is equivalent to Problem A. In particular, and any choice at1 of past actions, define the belief state
1) Given q = (q1 , . . . , qT ) QB , let f = (f1 , . . . , fT ) be t PX,S as follows: For s S and x X ,

ft (y t1 ) = qt ( | , , y t1 ). t (x, s) = Pf (Xt = x, St = s|Y t1 = y t1 , At1 = at1 ).


If Y t1 and At1 are random variables, then the belief state
Then LT (f ) = LT (q).
is a PX,S -valued random variable.
2) Given f = (f1 , . . . , fT ), let q = (q1 , . . . , qT ) QB be
The belief state evolves in a state-like manner as follows.
qt (yt | xt , st , y t1 ) = at (yt | xt , st ), where at = ft (y t1 ). Lemma 1. For any realization y of Y and a of A ,
t t t t t+1 is

Then LT (q) = LT (f ). given by


t+1 = (t , yt , at ) (24)
Proof. For any history (xt , st , y t1 ), at A, and st+1 S,
where is given by
P(St+1 = st+1 | X t = xt , S t = st , Y t = y t , At = at )
X (, y, a)(x0 , s0 )
= 1st+1 {st + yt xt } at (yt | xt , st )
Q(x0 |x)a(y|x, s0 x + y)(x, s0 x + y)
P
yt Y
= xX P .
= P(St+1 = st+1 | Xt = xt , St = st , At = at ). (22) (x,s)X S a(y|x, s)(x, s)

Thus, the probability distribution on (X T , S T , Y T ) induced Proof. For ease of notation, we use P(xt , st |y t1 , at1 ) to
by a decision policy f = (f1 , . . . , fT ) is given by denote P(Xt = xt , St = st |Y t1 = y t1 , At1 = at1 ).
Similar interpretations hold for other expressions as well.
Pf (S T = sT , X T = xT , Y T = y T ) Consider
T 
Y t+1 (xt+1 , yt+1 ) = P(xt+1 , st+1 |y t , at )
= PS1 (s1 )PX1 (x1 )q1 (y1 | x1 , s1 ) 1st {st1 xt1 +yt1 }
P(xt+1 , st+1 , yt , at |y t1 , at1 )
t=2 = (25)
 P(yt , at |y t1 , at1 )
Q(xt |xt1 )at (yt |xt , st ) .
Now, consider the numerator of the right hand side.
where at = ft (y t1 ). Under the transformations described P(xt+1 , st+1 , yt , at |y t1 , at1 )
in the Proposition, Pf and Pq are identical probabil- = P(xt+1 , st+1 , yt , at |y t1 , at1 , t )
ity distributions. Consequently, Ef [ct (Xt , St , At , Y t ; f )] = X
I q (St , Xt ; Yt | Y t1 ). Hence, LT (q) and LT (f ) are equiv- = P(xt+1 |xt )1st+1 (st + xt yt )
(xt ,st )X S
alent.
at (yt |xt , st )1at (ft (t ))t (xt , st ) (26)
Eq. (22) implies that {Xt , St }t1 is a controlled Markov
process with control action {At }t1 . In the next section, Substituting (26) in (25) (and observing that the denomi-
we obtain a dynamic programming decomposition for this nator of the right hand side of (25) is the marginal of the
problem. For the purpose of writing the dynamic program, numerator over (xt+1 , st+1 )), we get that t+1 can be written
it is more convenient to write the policy (19) as in terms of t , yt and at . Note that if the term 1at (ft (t ))
is 1, it cancels from both the numerator and the denominator;
At = ft (Y t1 , At1 ). (23) if it is 0, we are conditioning on a null event in (25), so we can
assign any valid distribution to the conditional probability.
Note that these two representations are equivalent. Any policy
of the form (19) is also a policy of the form (23) (that simply Note that an immediate implication of the above result is
ignores At1 ); any policy of the form (23) can be written that t depends only on (y t1 , at1 ) and not on the policy f .
as a policy of the form (19) by recursively substituting At in This is the main reason that we are working with a policy of
terms of Y t1 . Since the two forms are equivalent, in the next the form (23) rather than (19).
section we assume that the policy is of the form (23). Lemma 2. The cost LT (f ) in (21) can be written as
X T 
C. A dynamic programming decomposition 1
LT (f ) = E I(at ; t )
T t=1
The model described in Section III-B above is similar to a
POMDP (partially observable Markov decision process): the where I(at ; t ) does not depend on the policy f and is
system state (Xt , St ) is partially observed by a decision maker computed according to the standard formula
who chooses action At . However, in contrast to the standard X
cost model used in POMDPs, the per-step cost depends on I(at ; t ) = t (x, s)at (y | x, s)
the observation history and past policy. Nonetheless, if we xX ,sS, at (y|x, s)
yY log X .
consider the belief state as the information state, the problem t (x, s)at (y | x, s)
can be formulated as a standard MDP. (x,s)X S
8

Proof. By the law of iterated expectations, we have Recall that Wt = St Xt which takes values in W =
 T  {s x : s S, x X }. For any realization (y t1 , at1 ) of
1 X past observations and actions, define t PW as follows: for
LT (f ) = E[ct (Xt , St , At , Y t ; f )|Y t1 , At1 ]
T t=1 any w W,
Now, from (20), each summand may be written as
t (w) = Pf (Wt = w | Y t1 = y t1 , At1 = at1 ).
f t t1 t1 t t
E [ct (Xt , St , At , Y ; f ) | Y = y ,A = a ]
X If Y t1 and At1 are random variables, then t is a PW -
= t (x, s)at (y | x, s)
valued random variable. As was the case for t , it can be
xX ,sS, at (y|x, s)
yY log X shown that t does not depend on the choice of the policy f .
t (x, s)at (y | x, s)
Lemma 3. Under (A), t and t are related as follows:
(x,s)X S P
= I(at ; t ). 1) t (w) = (x,s)D(w) PX (x)t (s).
2) t = PX t .
Proof. For part 1):
Proof of Theorem 1. Lemma 1 implies that {t }t1 is a con-
trolled Markov process with control action at . In addition, t (w) = Pf (Wt = w | Y t1 = y t1 , At1 = at1 )
Lemma 2 implies that the objective function can be expressed
= Pf (St Xt = w | Y t1 = y t1 , At1 = at1 )
in terms of the state t and the action At . Consequently, in X
the equivalent optimization problem described in Section III-B, = PX (x)t (s).
there is no loss of optimality to restrict attention to Markovian (x,s)D(w)
policies of the form at = ft (t ); an optimal policy of this
For part 2):
form is given by the dynamic programs of Theorem 1 [24].
Proposition 2 implies that this dynamic program also solves t (s) = Pf (St = st |Y t1 = y t1 , At1 = at1 )
Problem A.
Note that the Markov chain {Xt }t1 is irreducible and = P (Wt + Xt = st |y t1 , at1 )
aperiodic and the per-step cost I(at ; t ) is bounded. Therefore, = (PX t )(st ).
the existence of the solution of the infinite horizon dynamic
program can be shown by following the proof argument Since t (x, s) = PX (x)t (s), Lemma 3 suggests that one
of [26]. could simplify the dynamic program of Theorem 1 by using t
as the information state instead of t . For such a simplification
D. Remarks about numerical solution to work, we would have to use charging policies of the form
qt (yt |wt , y t1 ). We establish that restricting attention to such
The dynamic program of Theorem 1, both state and action
policies is without loss of optimality. For that matter, define
spaces are distribution valued (and, therefore, subsets of Eu-
B as follows:
clidean space). Although, an exact solution of the dynamic
program is not possible, there are two approaches to obtain 
B = b PY |W : b(Y (w) | w) = 1, w W .

(27)
an approximate solution. The first is to treat it as a dynamic
program of an MDP with continuous state and action spaces Lemma 4. Given a A and PX,S , define the following:
and use approximate dynamic programming [23], [24]. The P
PW as (w) = (x,s)D(w) (x, s)
second is to treat it as a dynamic program for a PODMP and
use point-based methods [25]. The point-based methods rely b B as follows: for all y Y, w W

on concavity of the value function, which we establish below. P


(x,s)D(w) a(y | x, s)(x, s)
Proposition 3. The value functions {Vt }Tt=1 defined in Theo- b(y | w) = ;
(w)
rem 1 are concave.
a A as follows: for all y Y, x X , s S
See Appendix C for proof.
a(y|x, s) = b(y|s x).
IV. P ROOF OF T HEOREM 2
A. Simplification of the dynamic program Then under (A), we have
Under (A), the belief state t can be decomposed into 1) Invariant Transitions: for any y Y, (, y, a) =
product form t (x, s) = PX (x)t (s), where (, y, a).
t (s) = Pf (St = s | Y t1 = y t1 , At1 = at1 ). 2) Lower Cost: I(a; ) I(a; ) = I(b; ).
Therefore, in the sequential problem of Sec. III-B, there is no
Thus, in principle, we can simplify the dynamic program of
loss of optimality in restricting attention to actions b B.
Theorem 1 by using t as an information state. However, for
reasons that will become apparent, we provide an alternative Proof. 1) Suppose (X, S) and W = S X, S+ =
simplification that uses an information state t PW . W + Y , X+ PX . We will compare P(S+ |Y ) when
9

Y a(|X, S) with when Y a(|X, S). Given w W Let ft () denote the arg min of the right hand side
and y Y, of (30). Then, the optimal policy q = (q1 , . . . , qT ) is
X given by
Pa (W = w, Y = y) = a(y|x, s)(x, s)
(x,s)D(w) qt (yt |wt , t ) = bt (yt |wt ), where bt = ft (t ).
X
= b(y|w)(x, s)
Moreover, the optimal (finite horizon) leakage
(x,s)D(w)
rate is given by V1 (1 )/T , where 1 (w) =
(a) X P
= a(y|x, s)(x, s) (x,s)D(w) P X (x)P S1
(s).
(x,s)D(w) 2) For the infinite horizon, the optimal charging policy is
a time-homogeneous and is given by the solution of the
= P (W = w, Y = y) (28)
following fixed point equation:
where (a) uses that for all (x, s) D(w), s x = w.
Marginalizing (28) over W , we get that Pa (Y = y) = J + v() = min[Bb v](), PS . (31)
bB
Pa (Y = y). Since S+ = W + Y , Eq. (28) also implies
Pa (S+ = s, Y = y) = Pa (S+ = s, Y = y). Therefore, where J R is a constant and v : PS R. Let f ()
Pa (S+ = s|Y = y) = Pa (S+ = s|Y = y). denote the arg min of the right hand side of (31). Then,
2) Let (X, S) and W = S X. Then W . the time-homogeneous policy q = (q , q , . . . ) given by
Therefore, we have
q (yt |wt , t ) = bt (yt |wt ), where bt = f (t )
I(a; ) = I a (X, S; Y ) I a (W ; Y ).
where the last inequality is the data-processing inequality. is optimal. Moreover, the optimal (infinite horizon) leak-

age rate is given by J.
Under a, (X, S) W Y , therefore,
I(a; ) = I a (X, S; Y ) = I a (W ; Y ). Proof. Lemma 5 implies that {t }t1 is a controlled Markov
process with control action bt . Lemma 4, part 2), implies that
Now, by construction, the joint distribution of (W, Y ) is the per-step cost can be written as
the same under a, a, and b. Thus,
XT 
1
I a (W ; Y ) = I a (W ; Y ) = I b (W ; Y ). E I(b; ) .
T t=1
Note that I b (W ; Y ) can also be written as I(b; ). The
result follows by combining all the above relations. Thus, by standard results in Markov decision theorem [24], the
optimal solution is given by the dynamic program described
Once attention is restricted to actions b B, the update of above.
t may be expressed in terms of b B as follows:
Lemma 5. For any realization yt of Yt and bt of Bt , t+1 is
B. Weak achievability
given by
t+1 = (t , yt , at ) (29) To simplify the analysis, we assume that we are free to
choose the initial distribution of the state of the battery, which
where is given by
could be done by, for example, initially charging the battery
(, y, b)(w+ ) to a random value according to that distribution. In principle,
P such an assumption could lead to a lower achievable leakage
xX ,wW PX (x)1w+ {y + w x}b(y | w)(w)
= P . rate. For this reason, we call it weak achievability. In the next
wW b(y | w)(w) section, we will show achievability starting from an arbitrary
Proof. The proof is similar to the proof of Lemma 5. initial distribution, which we call strong achievability.
For any b B and PW , let us define the Bellman Definition 1. Any PS and PW are said to equivalent
operator Bb : [PW R] [PW R] as follows: to each other if they satisfy the transformation in Lemma 3.
X Definition 2 (Constant-distribution policy). A time-
[Bb V ]() = I(b; ) +

(w)b(y | w)V (, y, b) .
homogeneous policy f = (f , f , . . . ) is a called a
yY,wW
constant-distribution policy if for all PW , f () is a
Theorem 3. Under assumption (A), there is no loss of opti- constant. If f () = b , then with a slight abuse of notation,
mality in restricting attention to optimal policies of the form we refer to b = (b , b , . . . ) as a constant-distribution
qt (yt |wt , t ) in Problem A. policy.
1) For the finite horizon case, we can identify the optimal
policy q = (q1 , . . . , qT ) by iteratively defining value Recall that under a constant-distribution policy b B, for
functions Vt : PW R. For any PW , VT +1 () = 0, any realization y t of Y t , t and t are given as follows:
and for t = T, T 1, . . . , 1,
t (s) = P(St = s | Y t = y t , B t1 = bt1 )
Vt () = min[Bb Vt+1 ](). (30) t (w) = P(Wt = w | Y t = y t , B t1 = bt1 ).
bB
10

1) Invariant Policies: We next impose an invariance prop- Proof. Consider the following sequence of simplifications:
erty on the class of policies. Under this restriction the leakage
rate expression will simplify substantially. Subsequently we I b (W1 ; Y1 ) = H b (W1 ) H b (W1 |Y1 )
will show that the optimal policy belong to this restricted class. = H b (W1 ) H b (W1 + Y1 |Y1 )
(a)
Definition 3 (Invariance Property). For a given distribution 1 = H b (W1 ) H b (S2 |Y1 )
of the initial battery state, a constant-distribution policy b B (b)
= H b (W1 ) H b (S1 )
is called a invariant policy if for all t, t = 1 and t = 1 , (c)
where 1 is equivalent to 1 . = H b (W1 ) H b (S1 |X1 )
(d)
Remark 1. An immediate implication of the above defini- = H b (W1 ) H b (W1 |X1 )
tion is that under any invariant policy b, the conditional = I b (W1 ; X1 ).
distribution Pb (Xt , St , Yt |Y t1 ) is the same as the joint
distribution Pb (X1 , S1 , Y1 ). Marginalizing over (X, S) we get where (a) is due to the battery update equation (1); (b) is
that {Yt }t1 is an i.i.d. sequence. because b is an invariant ; (c) is because S1 and X1 are
independent; and (d) is because S1 = W1 + X1 .
Lemma 6. If the system starts with an initial distribution of
the battery state, and is equivalent to , then an invariant 2) Structured Policy: We now introduce a class of policies
policy b = (b, b, . . . ) corresponding to (, ) achieves a that satisfy the invariance property in Def. 3. This will be then
leakage rate used in the proof of Theorem 2.
Definition 4 (Structured Policy). Given PS and PW ,
LT (b) = I(W1 ; Y1 ) = I(b; ) a constant-distribution policy b = (b, b, . . . ) is called a
structured policy with respect to (, ) if:
for any horizon T .
(
PX (y) (y+w)
(w) , y X Y (w)
b(y|w) =
Proof. The proof is a simple corollary of the invariance prop- 0, otherwise.
erty in Definition 3. Recall from the dynamic program of The-
Note that it is easy to verify that the distribution b defined
orem 3, that the performance of any policy b = (b1 , b2 , . . . )
above is a valid conditional probability distribution.
such that bt B, is given by
Lemma 8. For any PS and PW , the structured policy
1
T
X  b = (b, b, . . . ) given in Def. 4 is an invariant policy.
LT (b) = E I(bt ; t ) .
T t=1
Proof. Since (t , t ) are related according to Lemma 3, in
order to check whether a policy is invariant it is sufficient
Now, we start with an initial distribution 1 = and follow to check that either t = 1 for all t or t = 1 for all t.
the constant-distribution structured policy b = (b, b, . . . ). Furthermore, to check if a time-homogeneous policy is an
Therefore, by Lemma 6, t = for all t. Hence, the leakage invariant policy, it is sufficient to check that either 2 = 1 or
rate under policy b is 2 = 1 . We will prove that 2 = 1 .
Let the initial distributions (1 , 1 ) = (, ) and the system
variables be defined as usual. Now consider a realization s2 of
LT (b) = I(b; ).
S2 and y1 of Y1 . This means that W1 = s2 y1 . Since Y1 is
chosen according to (|w1 ), it must be that y1 X Y (w1 ).
Remark 2. Note that Lemma 6 can be easily derived inde- Therefore,
pendently of Theorem 3. From Proposition 1, we have that:
Pb (S2 = s2 , Y1 = y1 ) = Pb (S2 = s2 , Y1 = y1 , W1 = s2 y1 )
T = 1 (s2 y1 )b(y1 |s2 y1 )
1X b
LT (b) = I (St , Xt ; Yt |Y t1 ). (32) = PX (y1 )1 (s2 ), (33)
T t=1
where in the last equality we use the fact that y1 X
b
From Remark 1, we have that I (St , Xt ; Yt |Y t1
) = Y (s2 y1 ). Note that if y1 6 X Y (s2 y1 ), then Pb (S2 =
I b (S1 , X1 ; Y1 ) = I b (W1 ; Y1 ), which immediately results in s2 , Y1 = y1 ) = 0. Marginalizing over s2 , we get Pb (Y1 =
Lemma 6. y1 ) = PX (y1 ).
Consequently, 2 (s2 ) = Pb (S2 = s2 |Y1 = y1 ) = 1 (s2 ).
For invariant policies we can further express the leakage Hence, b is invariant as required.
rate in the following fashion, which is useful in the proof of
optimality. Remark 3. As argued in Remark 1, under any invariant policy,
{Yt }t1 is an i.i.d. sequence. As argued in the proof of
Lemma 7. For any invariant policy b, Lemma 8, for a structured policy the marginal distribution of
Yt is PX . Thus, an eavesdropper cannot statistically distinguish
I b (W1 ; Y1 ) = I b (W1 ; X1 ). between {Xt }t1 and {Yt }t1 .
11

Proposition 4. Let , , and b be as defined in Theorem 2. E. Dynamic programming converse


Then, We provide two converses. One is based on the dynamic
1) ( , ) is an equivalent pair; program of Theorem 3, which is presented in this section;
2) b is a structured policy with respect to ( , ). the other is based purely on information theoretic arguments,
3) If the system starts in the initial battery state and fol- which is presented in the next section.
lows the constant-distribution policy b = (b , b , . . . ), In the dynamic programming converse, we show that for J
the leakage rate is given by J . given in Theorem 2, v () = H(), and any b B,
Thus, the performance J is achievable. J + v () [Bb v ](), PW , (36)
Proof. The proofs of parts 1) and 2) follows from the defini-
Thus, J is a lower bound of the optimal leakage rate (see [27],
tions. The proof of part 3) follows from Lemmas 6 and 7. [28]).
To prove (36), pick any PW and b B. Suppose W1
This completes the proof of the achievability of Theorem 2. , Y1 b(|W1 ), S2 = Y1 + W1 , X2 is independent of W1
and X2 PX and W2 = S2 X2 . Then,
X
C. Binary Model (Revisited) [Bb v ]() = I(b; ) + (w1 )b(y1 |w1 )v ((, y1 , b))
(w1 ,y1 )WY
We revisit the binary model in Section II-B. Recall that the
policy suggested in [8] is as follows: = I(W1 ; Y1 ) + H(W2 |Y1 ) (37)
where the second equality is due to the definition of condi-
qt (yt = 0|xt , st ) = q(yt = 1|xt , st ) = 1/2, xt = st (34)
tional entropy. Consequently,
and qt (yt |xt , st ) = 1xt (yt ) otherwise. It can be easily verified [Bb v ]() v () = H(W2 |Y1 ) H(W1 |Y1 )
that this policy is a structured policy in Def. 4, where the initial
= H(W2 |Y1 ) H(W1 + Y1 |Y1 )
state is taken as (s1 = 0) = (s1 = 1) = 1/2. Thus using
(a)
Lemma 6 and Lemma 7 it follows that the leakage rate equals = H(S2 X2 |Y1 ) H(S2 |Y1 )
I(W1 ; X1 ) = 0.5. This yields an analytical proof of the result (b)  
in [8]. min H(S2 X2 ) H(S2 ) , S2 2
2 PS

=J (38)
D. Strong achievability where (a) uses S2 = Y1 + W1 and W2 = S2 X2; (b) uses
Lemma 9. Assume that for any x X , PX (x) > 0. Let  1 |B) H(A1 A2 |B) minPA1 H(A1 )
the fact that H(A
( , ) an equivalent pair and b = (b , b , . . . ) be the H(A1 A2 ) for any joint distribution on (A1 , A2 , B).
corresponding structured policy. The equality in (38) occurs when b is an invariant policy
Assume that int(PS ) or equivalently, for any w W and 2 is same as defined in Theorem 2. For that are not
and y X Y (w), b (y|w) > 0. Suppose the system starts equivalent to , the inequality in (38) is strict.
in the initial state (1 , 1 ) and follows policy b . Then: We have shown that Eq. (36) is true. Consequently, J is a
lower bound on the optimal leakage rate J.
1) the process {t }1 converges weakly to ;
2) the process {t }1 converges weakly to ;
F. Information theoretic converse
3) for any continuous function c : PW R,
Consider the following inequalities: for any admissible
T policy q QB , we have
1X
lim E[c(t )] = c( ). (35) T
T T
t=1
X
T T
I(S1 , X ; Y ) = I(St , Xt ; Yt |Y t1 )
4) Consequently, the infinite horizon leakage rate under t=1
b is (a) XT

L (b ) = I(b , ). I(Wt ; Yt |Y t1 ) (39)


t=1

Proof. The proof of parts 1) and 2) is presented in Ap- where (a) follows from the data processing inequality.
pendix D. From 2), limt E[c(t )] = c( ), which im- Now consider
plies (35). Part 4) follows from part 3) by setting c(t ) = I(Wt ; Yt |Y t1 ) = H(Wt |Y t1 ) H(Wt |Y t )
I(b , t ).
= H(Wt |Y t1 ) H(Wt + Yt |Y t )

Lemma 2 implies that defined in Theorem 2 lies in (b)
= H(Wt |Y t1 ) H(St+1 |Y t )
int(PS ). Then, by Lemma 9, the constant-distribution policy (c)
b = (b , b , . . . ) (where b is given by Theorem 2), achieves = H(Wt |Y t1 ) H(St+1 |Y t , Xt+1 )
the leakage rate I(b , ). By Lemma 7, I(b , ) is same as = H(Wt |Y t1 ) H(St+1 Xt+1 |Y t , Xt+1 )
J defined in Theorem 2. Thus, J is achievable starting from (d)
any initial state (1 , 1 ). = H(Wt |Y t1 ) H(Wt+1 |Y t , Xt+1 ) (40)
12

where (b) follows from (1); (c) follows because of assump- and both converses extend to continuous alphabets under mild
tion (A); and (d) also follows from (1). technical conditions; see [31] for details. It is only the strong
Substituting (40) in (39) (but expanding the last term as achievability result that relies on the finiteness of the alphabets.
H(WT |Y T 1 ) H(WT |Y T )), we get Extending of our MDP formulation to incorporate an
T
additive cost, such as the price of consumption, is rather
I(S1 , X T ; Y T )
X 
H(Wt |Y t1 H(Wt+1 |Y t , Xt+1 )
 immediate. However, the approach presented in this work
t=1
for explicitly characterizing the optimal leakage rate in the
T 1 i.i.d. case may not immediately extend to such general cost
X
H(Wt |Y t1 , Xt ) + H(Wt |Y t1 ) functions. The study of such problems remains an interesting
 
= H(W1 ) +
t=1 further direction.
H(WT |Y T )
A PPENDIX A
T
X
t1 T P ROOF OF P ROPERTY 2
= H(W1 ) + I(Wt ; Xt |Y ) H(WT |Y ). (41)
t=2 PFor any int(PS ) and (s) : S R such that
sS (s) = 0. Let (s) := (s) + (s). Then for small
Now, we take the limit T to obtain a lower bound to enough , PS . Given such a , let PW,X (w, x) =
the leakage rate: PW |X (w|x)PX (x) = (w + x)PX (x). Then to show that
2

L (q) = lim sup


1
I(S1 , X T ; Y T ) I(W ; X) is strictly convex on PS we require d I(W
d2
;X)
> 0.
T T From Property 1, I(W ; X) = H(W ) H(S). Therefore,
 T 
1 X
t1 T dI(W ; X) d [H(S) + H(W )]
lim sup H(W1 ) + I(Wt ; Xt |Y ) H(WT |Y ) =
T T d dX
t=2 X
" T # = (s) ln (s) PX (s w)(s) ln PW (w)
(a) 1 X t1
= lim I(Wt ; Xt |Y ) s wW,sS
T T P 2
t=2 d I(W ; X) X (s)2
2 X
sS PX (s w)(s)
(b) = .
min I(S X; X) = J d2 s
(s) PW (w)
PS PS wW
q
Let aw (s) = (s) PX(sw)
p
where (a) is because the entropy of any discrete random (s) and bw (s) = (s) PX (s w).
variable is bounded and (b) follows from the observation Using the Cauchy-Schwarz inequality, we can show that
that every term in the summation is only a function of the P 2
posterior P (St |Y t1 ). Therefore, minimizing each term over d2 I(W ; X) X (s)2 X
sS aw (s)bw (s)
=
a PS PS results in a lower bound. This shows that J is a d2 s
(s) PW (w)
wW
lower bound to the minimum (infinite horizon) leakage rate. 2
P 2
 P 2

sS aw (s) sS bw (s)
X (s) X
>
s
(s) PW (w)
wW
V. C ONCLUSIONS AND D ISCUSSION !
X (s)2 X X
In this paper, we study a smart metering system that uses 2
= aw (s) = 0.
a rechargeable battery to partially obscure the users power s
(s)
wW sS
demand. Through a series of reductions, we show that the The strict inequality is because a and b cannot be linearly
problem of finding the best battery charging strategy can be dependent. To see this, observe that a(s) (s)
b(s) = (s)+(s) cannot
recast as a Markov decision process. Consequently, the optimal be equal to a constant for all s S since must contain
charging strategies and the minimum information leakage rate negative as well as positive elements.
are given by the solution of an appropriate dynamic program.
For the case of i.i.d. demand, we provide an explicit A PPENDIX B
characterization of the optimal battery policy and the leakage P ROOF OF P ROPOSITION 1
rate. In this special case it suffices to choose a memoryless The proof of Proposition 1 relies on the following interme-
strategy where the distribution of Yt depends only on Wt . Our diate results (which are proved later):
achievability results rely on restricting attention to a class of
invariant policies. Under an invariant policy, the consumption Lemma B.1. For any q QA ,
{Yt }t1 is i.i.d. and the leakage rate is characterized by a T
X
single-letter mutual information expression. We then further I q (S1 , X T ; Y T ) I q (Xt , St ; Yt |Y t1 )
restrict attention to what we call structured policies under t=1
which the marginal distribution of {Yt }t1 is PX . Thus, under with equality if and only if q QB .
the structured policies, an eavesdropper cannot statistically
Lemma B.2. For any qa QA , there exists a qb QB , such
distinguish between {Xt }t1 and {Yt }t1 . We provide two
that
converses; one is based on the dynamic programming argu- T T
ment while the other is based on a purely information theoretic X X
I qa (Xt , St ; Yt |Y t1 ) = I qb (Xt , St ; Yt |Y t1 ).
argument. It is worth highlighting that the weak achievability t=1 t=1
13

Combining Lemmas B.1 and B.2, we get that for any qa where (a) uses (42) and the induction hypothesis. Thus, (44)
QA , there exists a qb QB such that holds for t + 1 and, by the principle of induction, holds for
all t. Hence (43) holds and, therefore, I qa (Xt , St ; Yt |Y t1 ) =
I qa (S1 , X T ; Y T ) I qb (S1 , X T ; Y T ). I qb (Xt , St ; Yt |Y t1 ). The statement in the Lemma follows by
Therefore there is no loss of optimality in restricting attention adding over t.
to charging policies in QB . Furthermore, Lemma B.1 shows
that for any q QB , LT (q) takes the additive form as given A PPENDIX C
in the statement of the proposition. P ROOF OF P ROPOSITION 3
To prove the result, we show the following:
Proof of Lemma B.1. For any q QA , we have
n
Lemma C.1. For any action a A, if V : PX,S R is
(a) concave, then Ba V is concave.
X
q n n q t t1
I (S1 , X ; Y ) = I (S1 , X ; Yt |Y )
t=1 The proof of Proposition 3 follows from backward in-
n
(b) X duction. VT +1 is a constant and, therefore, also concave.
= I q (X t , S t ; Yt |Y t1 )
Lemma C.1 implies that VT , VT 1 , . . . , V1 are concave.
t=1
n Proof of Lemma C.1. The first term I(a; ) of [Ba V ]() is
(c) X
I q (Xt , St ; Yt |Y t1 ) a concave function of . We show the same for the second
t=1 term.
where (a) uses the chain rule of mutual information and the Note that if a function V is concave, then its per-
fact that (Z t2 , Y t1 ) Xt1 Xt ;4 (b) uses the fact that spective g(u, t) := tV (u/t) is concave in the domain
the battery process S t is a deterministic function of S1 , X t , {(u, t) : u/t Dom(V ), t > 0}. The second term in the defi-
and Y t given by (1); and (c) uses the fact that removing terms nition of the Bellman operator (10)
lowers the mutual information. X X 
a(y|x, s)(x, s) V ((, y, a))
Proof of Lemma B.2. For any qa = (q1a , q2a , . . . , qTa ) QA , yY (x,s)X S
construct a qb = (q1b , q2b , . . . , qTb ) QB as follows: for any t
and realization (xt , st , y t ) of (X t , S t , Y t ) let Pnumerator of (, y, a) is linear in
has this form because the
and the denominator is x,s a(y|x, s)(x, s) (and corresponds
qtb (yt |xt , st , y t1 ) = PqYa|Xt ,St ,Y t1 (yt |xt , st , y t1 ). (42) to t in the definition of perspective). Thus, for each y, the
t
summand is concave in , and the sum of concave functions
To prove the Lemma, we show that for any t, is concave. Hence, the second term of the Bellman operator
PqXat ,St ,Y t = PqXbt ,St ,Y t . (43) is concave in . Thus we conclude that concavity is preserved
under Ba .
By definition of qb given by (42), to prove (43), it is sufficient
to show that A PPENDIX D
P ROOF OF L EMMA 9
PqXat ,St ,Y t1 = PqXbt ,St ,Y t1 . (44)
The proof of the convergence of {t }t1 relies on a result
We do so using induction. on the convergence of partially observed Markov chains due
For t = 1, PqXa1 ,S1 (x, s) = PX1 (x)PS1 (s) = PqXb1 ,S1 (x, s). to Kaijser [29] that we restate below.
This forms the basis of induction. Now assume that (44) hold
Definition 5. A square matrix D is called subrectangular if for
for t.
every pair of indices (i1 , j1 ) and (i2 , j2 ) such that Di1 ,j1 6= 0
In the rest of the proof, for ease of notation, we denote
and Di2 ,j2 6= 0, we have that Di2 ,j1 6= 0 and Di1 ,j2 6= 0.
PqXat+1 ,St+1 ,Y t (xt+1 , st+1 , y t ) simply by Pqa (xt+1 , st+1 , y t ).
For t + 1, we have Theorem 4 (Kaijser [29]). Let {Ut }t1 , Ut U, be a finite
X state Markov chain with transition matrix P u . The initial state
Pqa (xt+1 , st+1 , y t ) = Pqa (xt+1 , xt , st+1 , st , y t ) U1 is distributed according to probability mass function PU1 .
(xt ,st )X S
X Given a finite set Z and an observation function g : U Z,
= Q(xt+1 |xt )1st+1 {st xt + yt }qa (yt |xt , st , y t1 ) define the following:
(xt ,st )X S The process {Zt }t1 , Zt Z, given by
P (xt , st , y t1 )
qa
Zt = g(Ut ).
(a) X
= Q(xt+1 |xt )1st+1 {st xt + yt }qb (yt |xt , st , y t1 ) The process {t }t1 , t PU , given by
(xt ,st )X S

P (xt , st , y t1 )
qb t (u) = P(Ut = u | Z t ).
= Pqb (xt+1 , st+1 , y t ) A square matrix M (z), z Z, given by
(
4 The notation A B C is used to indicate that A is conditionally Piju if g(j) = z
[M (z)]i,j = i, j U.
independent of C given B. 0 otherwise
14

Qm
If there exists a finite sequence z1m such that t=1 M (zt ) which time it reaches ms ) and then remains constant for the
is subrectangular, then {t }t1 converges in distribution to a remaining s steps. Therefore,
limit that is independent of the initial distribution PU1 .
P(Sms = ms | U1 = (s, y),
We will use the above theorem to prove that under policy
b , {t }t1 converges to a limit. For that matter, let U = Y ms 1 = (111 . . . 1), X ms = xms ) > 0.
S Y, Z = Y, Ut = (St , Yt1 ) and g(St , Yt1 ) = Yt1 . Since the sequnce of demands xm has a positive probability,
First, we show that {Ut }t1 is a Markov chain. In particular,
for any realization (st+1 , y t ) of (S t+1 , Y t ), we have that P(Sms = ms , X ms = xms | U1 = (s, y),

Pb (Ut+1 = (st+1 , yt ) | U t = (st , y t1 )) Y ms 1 = (111 . . . 1)) > 0.
X
= P (Ut+1 = (st+1 , yt ), Xt = xt | U t = (st , y t1 )) Therefore,
xt X
X P(Sms = ms | U1 = (s, y), Y ms 1 = (111 . . . 1)) > 0
= 1st+1 {yt + st xt }b (yt |st xt )PX (xt )
xt X which completes the proof.
b
= P (Ut+1 = (st+1 , yt ) | Ut = (st , yt1 )). Proof of Eq. (46). The proof is similar to the Proof of (45).
Given the final state (s0 , 0), define s0 = ms s0 and consider
Next, let m = 2ms and consider
the sequence
z m = |111{z
1} 000 0} .
| {z x2m
ms +1 = |
s
1} 000
111{z 0} .
ms times ms times | {z
s0 times s0 times
m
We will show that this z satisfies the subrectangularity Under this sequnce of demains and the sequence of consump-
condition of Theorem 4. The basic idea is the following. tion given by ym 2ms 1
= (00 . . . 0), the state of the battery
s
Consider any initial state u1 = (s, y) and any final state decreases by 1 for the first s0 steps (at which time it reaches
um = (s0 , 0). We will show that s0 ) and then remains constant for the remaining s0 steps. Since
P(Sms = ms | U1 = (s, y), Z ms = (111 . . . 1)) > 0, x2m s
ms +1 has positive probability, we can complete the proof by

(45) following an argument similar to that in the proof of (45).

and R EFERENCES
0 ms
P(S2ms = s | Ums = (sm , 1), Zm s +1
= (000 . . . 0)) > 0. [1] A. Ipakchi and F. Albuyeh, Grid of the future, in IEEE Power Energy
(46) Mag., vol. 7, no. 2, pp. 52-62, Mar.-Apr. 2009.
[2] A. Predunzi, A neuron nets based procedure for identifying domestic
appliances pattern-of-use from energy recordings at meter panel, in Proc.
Eqs. (45) and (46) show that given the observation sequence IEEE Power Eng. Society Winter Meeting, New York, Jan. 2002.
z m , for any initial state (s, y) there is a positive probabil- [3] G. W. Hart, Nonintrusive appliance load monitoring, Proceedings IEEE,
ity of observing any final state (s0 , 0).5 Hence, the matrix vol. 80, no. 12, pp. 1870-1891, Dec. 1992.
Q m [4] H. Y. Lam, G. S. K. Fung, and W. K. Lee, A novel method to construct
t=1 M (z) is subrectangular. Consequently, by Theorem 4, taxonomy of electrical appliances based on load signatures, IEEE Trans.
the process {t }t1 converges in distribution to a limit that Consumer Electronics, vol. 53, no. 2, pp. 653-660, May 2007.
is independent of the initial distribution PU1 . [5] C. Efthymiou and G. Kalogridis, Smart grid privacy via anonymization
P of smart metering data, in Proc. IEEE Smart Grid Commun. Conf.,
Now observe that t (s) = yY t (s, y) and (t , t )
Gaithersburg, Maryland, 2010.
are related according to Lemma 3. Since {t }t1 converges [6] G. Kalogridis, C. Efthymiou, S. Z. Denic, T. A. Lewis, and R. Cepeda,
weakly independent of the initial condition, so do {t }t1 and Privacy for smart meters: towards undetectable appliance load signa-
tures, in Proc. IEEE Smart Grid Commun. Conf., Gaithersburg, Mary-
{t }t1 . land, 2010.
Let and denote the limit of {t }t1 and {t }t1 . [7] I. Csiszar and J. Korner, Information Theory: Coding Theorems for
Suppose the initial condition is ( , ). Since b is an Discrete Memoryless Channels, New York: Academic, 1981.
[8] D. Varodayan and A. Khisti, Smart meter privacy using a rechargeable
invariant policy, (t , t ) = ( , ) for all t. Therefore, the battery: Minimizing the rate of information leakage, in Proc. IEEE Int.
= ( , ).
limits (, ) Conf. Acoust. Speech Sig. Proc. Prague, Czech Republic, May 2011.
[9] O. Tan, D. Gunduz, and H. V. Poor, Increasing smart meter privacy
Proof of Eq. (45). Given the initial state (s, y), define s = through energy harvesting and storage devices", IEEE Journal on Selected
Areas in Communications, vol. 31, pp. 1331-1341, 2012 Chicago
ms s, and consider the sequence [10] G. Giulio, D. Gunduz, and H. Vincent Poor, Smart meter privacy with
an energy harvesting device and instantaneous power constraints." IEEE
xms = |000{z 1} .
0} |111{z International Conference on Communications (ICC), 2015.
s times s times [11] S. Han, T. Ufuk and G. Pappas, Event-based information-theoretic pri-
vacy: A case study of smart meters", 2016 American Control Conference
Under this sequence of demands, consider the sequence of (ACC).
consumption y ms 1 = (11 . . . 1), which is feasible because [12] L.Bahl, J.Cocke, F.Jelinek, and J.Raviv, "Optimal Decoding of Linear
Codes for minimizing symbol error rate", IEEE Transactions on Infor-
the state of the battery increases by 1 for the first s steps (at mation Theory, vol. IT-20(2), pp. 284-287, March 1974.
[13] L. Sankar, S.R. Rajagopalan, and H.V. Poor, An Information-Theoretic
5 Note that given the observation sequence z m , the final state must be of Approach To Privacy, in Proc. 48th Annual Allerton Conf. on Commun.,
the form (s0 , 0). Control, and Computing, Monticello, IL, Sep. 2010, pp. 1220-1227.
15

[14] D. Gunduz and J.Gomez-Vilardebo, Smart meter privacy in the pres-


ence of an alternative energy source, in 2013 IEEE International
Conference on Communications (ICC), June 2013.
[15] S. Rajagopalan, L. Sankar, S. Mohajer, and V. Poor, Smart meter pri-
vacy: A utility-privacy framework, in 2nd IEEE International Conference
on Smart Grid Communications, 2011.
[16] S. Tatikonda and S. Mitter, The capacity of channels with feedback,
IEEE Trans. on Inform. Theory, vol. 55, no. 1, pp. 323-349, January 2009.
[17] A. Goldsmith and P. Varaiya, Capacity, mutual information, and coding
for finite state Markov channels, IEEE Trans. Inform. Theory, vol. 42,
pp. 868-886, May 1996.
[18] H. Permuter, T. Weissman and B. Roy, Capacity and zero-error capacity
of the chemical channel with feedback." IEEE International Symposium
on. IEEE, 2007, Nice, France
[19] R. Blahut, An algorithm for computing the capacity of arbitrary discrete
memoryless channels, IEEE Trans. Inform. Theory, pp. 14-20, 1972.
[20] S. Arimoto, Computation of channel capacity and rate-distortion func-
tions, IEEE Trans. Inform. Theory, pp. 160-473, 1972
[21] J. Yao and P. Venkitasubramaniam, On the Privacy of an In-Home
Storage Mechanism, 52nd Allerton Conference on Communication,
Computation and Control, Monticello, IL, October 2013.
[22] L. Wang, J,O. Woo and M. Madiman: A lower bound on the Rlnyi
entropy of convolutions in the integers, Proc. IEEE Int, Symp. on Inform.
Theory (ISIT), Honolulu, Hawaii, July 2014.
[23] W. B. Powell, Approximate Dynamic Programming: Solving the Curses
of Dimensionality, Wiley, 2011.
[24] D. P. Bertsekas, Dynamic Programming and Optimal Control Athena
Publications, 2012.
[25] G. Shani, J. Pineau, R. Kaplow, A survey of point-based POMDP
solvers, Autonomous Agents and Multi-Agent Systems, vol. 27, no. 1,
pp. 151, Jul 2013.
[26] R.G. Wood, T. Linder, S.Yksel, Optimal Zero Delay Cod-
ing of Markov Sources: Stationary and Finite Memory Codes,
arXiv:1606.09135, June 2016.
[27] A. Arapostathis, V. Borkar, E. Fernandez-Gaucherand, M. K. Gosh, and
S. I. Marcus, Discrete-time controlled Markov processes with average
cost criterion: A survey, SIAM J. Contr. Optimiz., vol. 31, no. 2, pp.
282-344, Mar. 1993.
[28] M. Schal, Average Optimality in Dynamic Programming with General
State Space, Mathematics of Operations Research vol. 18 no. 1, pp. 163-
172, Feb 1993.
[29] T. Kaijser, A limit theorem for partially observed Markov chains, Ann.
Probab., Vol.3, no. 4, pp. 667-696, 1975.
[30] S. Li, A. Khisti, and A. Mahajan, Structure of optimal privacy-
preserving policies in smart-metered systems with a rechargeable battery,
Proceedings of the IEEE SPAWC, pp. 1-5, Stockholm, Sweden, June 28-
July 1, 2015.
[31] S. Li, Information Theoretic Privacy in Smart Metering Systems Using
Rechargeable Batteries," MASc thesis, University of Toronto, Jan. 2016

Das könnte Ihnen auch gefallen