Sie sind auf Seite 1von 2

Appendix (Paper #3961) A.

3 Proof of Theorem 3
Proof. Note that V ps0 q is the defender utility by the best re-
A Proofs sponse against y in GpS, Aq by Lemma 1. Then V ps0 q ě
A.1 Proof of Theorem 2 Ud px, yq ě 0. If V ps0 q ą Ud px, yq, π, i.e., the output at
Line 4, must contain some states s with V psq ą 0 or their
Proof. Given x, we first show that conditions (3a)-(3c) are actions that are not in GpS 1 , A1 q. Consequently, new states
satisfied. By Eq.(4), we have: and actions will be added to GpS 1 , A1 q, and then GpS 1 , A1 q is
x
expanded. In the worst case, GpS 1 , A1 q “ GpS, Aq, where
ř ř ř
lPAs fs0 ,l “ lPAs P ps0 qxs0 ,l “ lPAs xs0 ,l “ 1,
IGRS will stop and V ps0 q “ Ud px, yq. Therefore, IGRS
0 0 0

which is Eq.(3a). By Eq.(2c), P psq “ lPAs xs,l P x psq


x will converge with V ps0 q “ Ud px, yq with a finite number
ř

p@s P SzS T q. Moreover, by Eq.(1), @s P Szpts0 u Y S T q, of steps because the number of states and actions in GpS, Aq
is finite. Therefore, Ud px, yq ě Ud px1 , yq p@x1 q. Note that,
ř x 1
ř x Ua px, yq ě Ua px, y1 q p@y1 q. Then px, yq is an NE.
s1 PSzS T :sPSs1 ,ls P ps qxs ,l “ lPAs P psqxs,l .
1 s

A.4 Proof of Theorem 4


Further, by Eq.(4), @s P Szpts0 u Y S T q,
ř ř Proof. We prove with the assumption that the defender has
s1 PSzS T :sPSs1 ,ls fs ,l “ lPAs fs,l ,
1 s
a single resource r, and the proof can be easily extended to
the case with multiple resources. Note that, if there initially
which is Eq.(3b). Obviously, Eq.(3c) is obtained from exits an exit node ve such that distpv0a , ve q ă distpv0r , ve q,
Eqs.(1), (2d) and (4). For each o P O, the adversary can escape to the external world by following
Ud px, oq “ sPS c :hs Ďo P x psq
ř
the shortest path to ve and defender cannot capture the ad-
“ sPS c :hs Ďo s1 PSzS T :sPSs1 ,ls P x ps1 qxs1 ,ls versary regardless of the policy adopted. Thus, to verify the
ř ř
ř ř correctness of Theorem 4, we only consider the game starting
“ sPS c :hs Ďo s1 PSzS T :sPSs1 ,ls fs1 ,ls , from a “safe” initial state s0 . That is, we call a state s a safe
“Ud pf , oq. state if distpvsa , ve q ě distpvsr , ve q holds for any exit node
ve , where vsa and vsr are the locations of the adversary and the
Therefore, @x, Df defined by Eq.(4) such that Ud px, oq “ defender at state s respectively. For a state that is not safe, the
Ud pf , oq @o P O. adversary will escape for sure in the worst case.
Given f , we define x in Eq. (5). Obviously, P x psqxs,l “ Now we present our key statemen- 𝒗𝒆
fs,l , and then Ud pf , oq “ Ud px, oq. Therefore, @f , Dx defined 𝒗𝒂𝒔
t to prove Theorem 4 as follows. 𝒗𝒇
by Eq. (5) such that Ud pf , oq “ Ud px, oq @o P O. W.l.o.g., let all the exit nodes be at
the same horizontal border, such as 𝒓
A.2 Proof of Lemma 1 tve , vf , vg u in the right figure. For 𝒗𝒔
Proof. Denote that MDP* includes all states and actions in any safe state s where vsr is at the
GpS, Aq, and its optimal policy is π ‹ . We prove this lem- same border with all exit nodes, there

ma by proving V π ps0 q “ V π ps0 q. Note that V psq repre- exits a deterministic policy π on- 𝒗𝒈
sents the probability of catching the adversary starting from wards such that all the states reached
s. Therefore, we prove that for each state s with V π psq ą 0, by following π are safe states, until a capture state is reached
‹ finally. Note that we assume the initial state s0 to be safe to
V π psq “ V π psq. avoid trivial results. That is, there always exists a determinis-
MDP of IGRS has two kinds of states that are differen- tic policy ensuring the capture of the adversary which, obvi-
t from the states in MDP*. 1). For the temporary escape ously, is optimal. Theorem 4 is finally concluded by showing

state s R S e , V π psq “ V π psq “ 0, i.e., the probability that π is a feasible solution in the restricted NEST generated
of catching the adversary starting from state s is 0 because by HGRS.
no resources at state s have a chance to interdict the support Several observations can be easily verified. We say node
paths of y generating hs . 2). Some states include only a part v “ pi, jq is above v 1 “ pi1 , j 1 q if i ă i1 , and under v 1 when
of actions in the MDP and ignore some of them. However, i ą i1 . v and v 1 are called to be at the same side of v 2 when
each ignored action has an equivalent action included in the they both are above v 2 or under v 2 .
MDP. For example, at state s, resource 1 does not have ac- Observation 1: There exists no pair of exit nodes ve
tions having a chance to interdict the support paths of y, but and vg such that, for any location vsa not on the border,
resource 2 has action v having a chance to interdict one of distpvsa , ve q “ distpvsr , ve q and meanwhile distpvsa , vg q “
them, then l1 “ pvs1 , vq and l2 “ pv2 , vq (v2 is a neighbor of distpvsr , vg q.
vs1 ) will give the defender the same utility because the action Observation 2: The exit node ve and location vsa (not on the
of resource 1 does not influence the probability of catching border) must be at the same side of vsr when distpvsa , ve q “
the adversary after state s. Consequently, for any such state distpvsr , ve q.

s with V psq ą 0, V π psq “ V π psq. In addition, obviously, Observation 3: If exit node ve and location vsa (not on
if state s has the same action

set in the MDP and MDP* with the border) are at different sides of vsr with distpvsa , ve q ě
V π psq ą 0, V π psq “ V π psq. distpvsr , ve q, then distpvsa , ve q ´ distpvsr , ve q ě 2.
Based on these observations, π is defined as follows. At s-
tate s, if there exists no exit node ve such that distpvsa , ve q “
distpvsr , ve q, the resource r just stays unmoved; Otherwise,
according to Observation 1, there is one unique ve satis-
fies that equality and the defender will move towards ve . As
one can easily verify, π only takes an action in the restrict-
ed set A1s generated in HGRS, and thus is a feasible solu-
tion. We then show the optimality of π in the sense that if
s is safe, the next state s1 reached is also safe following π.
This trivially holds if there exits no exit node ve such that
distpvsa , ve q “ distpvsr , ve q. When such exit node exists,
it is enough for us to prove that distpvsa1 , vq ě distpvsr1 , vq
holds for any exit node v and any possible location vsa1 . We
use the wrapped figure to help the illustration. Suppose ve is
above vsr , according to Observation 2, vsa is also above vsr .
Observation 3 secures the exit nodes (e.g., vg in the figure)
under vsr since both players can only move one step further.
For any exit node (e.g., vf in the figure) above vsr , we have
distpvsr1 , vf q “ distpvsr , vf q ´ 1, and then distpvsa1 , vf q ě
distpvsr1 , vf q still holds. Thus, the safety of the state will be
ensured following π and the optimality of HGRS is conclud-
ed.

B An Example About the Action Generation


of HGRS
Consider the example 14 15 16 17 18
in the right figure. a
The set of exit nodes 9 10 11 12 13
is V e “ t1, 2, 4, 5u.
The interdiction n-
odes set is V int “ 6 7 8
t1, . . . , 8, 10, 11, 12u
and then the min-
node-cut nodes set is 1 2 3 4 5
V m “ t10, 12u. There-
fore, the key nodes set d
is V c “ V e Y V m “ t1, 2, 4, 5, 10, 12u. Consider the state
s shown in the figure where the defender is at node 3 and
the adversary is spotted at node 16 trying to reach any exit
nodes through any possible paths. Since for any key node
in V c , it takes more time steps for the adversary to reach
than defender, we only consider the action of staying at s,
i.e., A1s “ t3u. Suppose that we now observe the adversary
at node 7. To secure exits 1 and 5, which are both two steps
away from the defender and adversary, two actions need to
be considered, i.e., moving to 2 and 4. Thus, A1s “ t2, 3, 4u.

Das könnte Ihnen auch gefallen