Beruflich Dokumente
Kultur Dokumente
x x x c E
T
T
t
t u 0
1
0
| ) (
1
, as " goes to nfnty, s
mnmzed smutaneousy for a states and the am s to dentfy
that pocy.
Here, the Markov process s assumed to be rreducbe; for each par
of state
( ) y x,
and each pocy
u
, there s a t such that ( ) 0 , > y x P
t
u
. In
other words, t s possbe to get to any state from any state. Ths
mpes that, for each pocy
u
, the mt
1
]
1
x x x c E
T
T
t
t u
T
0
1
0
| ) (
1
lim
exsts and the average cost s ndependent of the nta state n the
system.
Denote the optma average cost by u
u
g g min
*
'
+ +
y
A a
y V y a x p a x c x V g
x
, , , min
.
3-4'5
where
( ) V
denote vaue functon. The nterpretaton of
( ) x V
s the
dfference n accrued cost when startng the process n state x
reatve to a reference state. Bemans equaton can be formuated
n terms of matrces as foows.
V P c V e g
u u
* *
. + +
,
3-4-5
where
e
s a vector wth 1 as entres, *
u
c
and *
u
P
are vectors of the
costs and the transton probabtes based on the optma pocy,
respectvey.
Denote the souton of Bemans equaton by pars ( )
* *
,V g . An
aternatve method for dervng % s so caed pocy teraton. Ths
agorthm starts wth consderng a pocy n. The correspondng
( ) V g,
can be obtaned by sovng the Bemans equaton:
( ) ( ) ( )
+
y
y V y x x p x x c x ), ( , ) ( , ) (
.
3-465
To mprove the pocy n, take
( ) ( ) ( )
'
y
A a
y V y a x p a x c x , , , min arg ) (
'
3-475
n each state. The correspondng ( )
' '
,V g can be obtaned by agan
sovng the Bemans equaton. The mprovement can be agan
obtaned by sovng ( 2 -45 based on
'
V . Ths teraton w be
contnued unt the mnmum s attaned for each state. Note that
the vaue functon for every state has to be stored n memory n
every step. Therefore, the appcabty of dynamc programmng s
severey mted. The doman of the optma vaue functon s the
state space of the system to be controed. Ths means that the
number of varabes (vaue functon) to be stored and computed s
equa to the sze of the state space. When the state space s arge
ths dynamc programmng method becomes computatonay
ntractabe. Especay when deang wth mut-dmensona state
space, ts sze grows exponentay n the number of state varabes.
Ths probem s caed curse of dmensonaty.
-2- A&P 8it% a linear approximation arc%itecture
To aevate the curse of dmensonaty, the probem s soved by
fndng an approxmaton to the vaue functon,
K
S V :
~
, caed
the scoring function. The underyng assumpton s that the vaue
functon has some structure such that a reasonabe approxmaton
exsts.
By usng the near approxmaton archtecture, the scorng functon
s generated wthn a parameterzed cass of functons. It maps the
system state space to the set of rea numbers. Consder a gven set
of basis functions
K i S
i
, , 1 , :
, the scorng functons are
represented as near combnatons of the bass functons:
( ) ( )
K
i
i i
r x r V
1
~
. ,
3-495
Imagne that the pre-seected bass functons are stored as coumns
of matrx
K S
, and each row corresponds to the bass functons
evauated at a dfferent state
x
.
1
1
1
]
1
| |
| |
1 K
.
3-4:5
Now the optmzaton probem s formuated and anayzed as an
optmzaton probem for computng the weghts
K
i
r . Hence, t
suffces to store the weghts assgned to each of the bass functons
n the near combnaton nstead of storng the vaue functon for
each state n the system. The number of varabes (one per bass
functon) to be stored s tremendousy smaer than the number
compared to the vaue functon wth one vaue per state n the
system.
6 Approximate Linear Programming for
a!erage costs
A successfu use of approxmate dynamc programmng depends on
a good choce of the bass functons and a good choce of weghts
assgned to each of the bass functons n the near combnaton. A
study to the optma seecton of bass functons s out of the scope
of ths paper. Therefore, we assume that the set of bass functons s
pre-specfed, and that the focus s on fndng an approprate
parameter vector
K
r , gven a pre-seected set of bass functons.
It s known that the dynamc programmng probem can be recast as
a near programmng probem. However, ths exact near
programmng approach aso suffers from the curse of
dmensonaty. They have as many varabes as the number of
states n the system and at east the same number of constrants.
Combnng the exact near programmng approach wth the near
approxmaton archtecture eads to the approximate linear
programming agorthm (ALP). Compared to the exact near
program that stores the optma vaue functon for each state n the
system, the ALP has a much smaer number of varabes snce t has
as many varabes as the number of bass functons. In the next
sectons, the two-phase ALP approach for average costs s
descrbed. The frst phase of the average-cost ALP prortzes
approxmaton of the optma average cost, but does not necessary
gve a good approxmaton to the vaue functon. The second phase
expcty approxmates the vaue functon.
62' "irst p%ase of t%e a!erage4cost ALP
Reca the Bemans equaton (see Equaton 3 - 4'5). It can be
soved by the average cost Exact Lnear Programmng (ELP):
( ) ( ) ( ) ( ) . , , , , min . .
max
,
x y V y a x p a x c x V g t s
g
y
A a
V g
x
'
+ +
364.5
The probem s transated n a maxmzaton of the average cost that
woud be sub|ect to nequates of the form "" whch corresponds
to upper bounds. Note that the constrants are non-near, each
constrant nvoves a mnmzaton over the possbe actons. But
each constrant can be decomposed nto |A
x
| constrants. Therefore,
probem ( 3 -7) can be seen as a Lnear Programmng descrbed by
( 3 -8).
( ) ( ) ( ) ( ) . , , , , , . .
max
,
a x y V y a x p a x c x V g t s
g
y
V g
+ +
364)5
Ths resuts n a tota of | | S |A
x
|+1 constrants, whch s
unmanageabe f the state space s arge. The combnaton of the
exact near programmng and the near approxmaton archtecture
eads to the first phase ALP descrbed by ( 3 -9).
( ) ( ) ( ) ( ) . , , , , , . .
max
,
a x y r y a x p a x c x r g t s
g
y
r g
+ +
364;5
Denote the souton of the frst phase ALP by( )
1 1
, r g . Note that the
maxmzaton probem n ( 3 -9) s equvaent to mnmzng | |
1
*
g g .
Snce the frst phase ALP corresponds to the exact LP ( 3 -7) wth the
extra constrant r V , the souton to the frst phase ALP s mted,
*
1
g g for a feasbe
1
g . Ths mpes that the frst phase ALP can be
seen as an agorthm to approxmate the optma average cost.
Compared wth the ELP that stores the optma vaue functon for
each state n the system, the ALP has a much smaer number of
varabes snce that t has as many varabes as the number of bass
functons pus one. However, the ALP has st as many constrants
as the number of state-acton pars.
62- Second p%ase of t%e a!erage4cost ALP
It turns out, from an exampe gven n the paper |1/, that even
though the frst phase ALP produces a good approxmaton to the
optma average cost, t can produce arbtrary bad poces. The
man probem s that the agorthm of the frst phase ALP has prorty
to approxmate the optma average cost, but t does not necessary
yed a good approxmaton to the optma vaue functon. Hence, a
two-phase average-cost ALP s proposed n whch the frst phase s
smpy the frst phase of the average-cost ALP ntroduced n Secton
3.1. In the frst phase, the approxmaton to the optma average
cost s generated, whe n the second phase the focus s on the
approxmaton to the optma vaue functon.
The second phase of the ALP s formuated as foows.
( ) ( ) ( ) ( ) . , 0 , , , , . .
max
2
a x y r y a x p a x c x r g t s
r c
y
T
r
+ +
364'(5
The parameters that have to be pre-specfed are the state
reevance weghts
c
>0 and
2
g . Denote the optma souton of the
second phase of the average-cost ALP by r
&
. In De Faras and Van
der Roy <'/, a emma and some theorems were descrbed and used
to understand how the state reevance weghts
c
and the estmated
optma average cost
2
g n the second phase of the ALP can be used
for controng the quaty of the approxmaton to the optma vaue
functon.
In the foowng theorem, the nterpretaton of second phase ALP as
the mnmzaton of a certan weghted norm of the approxmaton
error s gven, wth weghts equa to the state reevance weghts.
T%eorem ' 3&e "arias and Van =o# <'/5>
Let r
-
,e t%e optimal solution to t%e t8o4p%ase ALP2 It
minimizes
c
g
r V
, 1
2
o!er t%e feasi,le region of t%e t8o4
p%ase ALP2
Proof> T%e norm
c , 1
.
is defined ,#
S x
c
x V x c V | ) ( | ) (
, 1 2
Maximizing r c
T
is equi!alent to minimizing
) (
2
r V c
g
T
2 It is
8ell kno8n t%at for all V1
V e g c P I
) ( ) (
2
1
* *
1 8e %a!e
2
g
V V
2 *ence1 an# r t%at is a feasi,le solution to t8o4p%ase ALP
pro,lems satisfies
2
g
V r
2 It follo8s t%at
, | ) ( ) ( | ) (
2 2 2
, 1
r c V c x r x V x c r V
T
g
T
S x
g
c
g
and maximizing r c
T
is t%erefore equi!alent to minimizing
c
g
r V
, 1
2
2
Hence, any fxed choce of
2
g , that satsfes
*
2
g g , there s bound
( ) ( ) e P I c g g r V r V
T
c
g
c
1
2
*
, 1
2
, 1
2
*
*
2
.
364''5
The two-phase ALP mnmzes the upper bound on the norm
c
r V
, 1
2
*
of the error n the approxmated vaue functon. The
state reevance weght
c
determnes how errors over dfferent
regons of the state space are weghted when approxmatng the
optma vaue functons, and can be used for specfyng the trade-off
n the quaty of the approxmaton across dfferent states.
Therefore, to generate a better approxmaton n a regon of the
state space one can assgn reatvey arger weghts to that regon.
To have some cue on how to choose the approprate state
reevance weghts c, performance bounds w be provded n the
next secton.
626 State =ele!ance ?eig%ts
A bound on the performance of greedy poces assocated wth
approxmate vaue functons were presented n De Faras and Van
Roy <'/ that provdes some gudance on choosng approprate state
reevance weghts. The bound s descrbed n Theorem 2.
T%eorem - 3&e "arias and Van =o# <'/5>
"or all V1 let
V
g
and
V
, 1
* * * *
* *
*
. ) ( .
) . (
) ( ) (
* *
+ +
+
+ +
The performance bound descrbed n Theorem 2 gves an aternatve
for seectng state reevance weghts. One approach s to seect the
state reevance weght correspondng to the statonary state
dstrbuton assocated wth the greedy pocy. It seems ogca, snce
the am s to have a good approxmaton to the vaue functon,
mportanty, thus the states that are vsted more often need to be
approxmated better. One dffcuty wth obtanng the statonary
state dstrbuton s that one shoud know the optma pocy
beforehand and the probem s fndng the optma pocy yet. It
suggests an teratve scheme usng n each teraton the weghts
correspondng to the statonary state dstrbuton assocated wth
the pocy generated by the prevous teraton.
7 Traffic lig%t control
The am of ths paper s to sove the traffc ght contro probem wth
the ALP approach descrbed n Chapter 3 . In ths chapter, the basc
noton and the modeng assumptons w be ntroduced.
Furthermore, the probem w be formuated as a Markov Decson
Probem.
72' $asic notation and t%e modelling assumptions
Consder a smpe ntersecton of two two-way streets, F4C2, whch
s ustrated n Fgure 4 .1. The fows are numbered cockwse. Cars
that arrve at one of the anes go ether straght crossng the
ntersecton or make a eft turn. The set of 4 fows s parttoned nto
2 ds|ont subsets, C
1
and C
2
. A subset of fows s caed a
combination. Two compatbe fows can safey cross the ntersecton
smutaneousy, ese they are caed antagonistic. The combnatons
are fxed, and they are chosen such that there s a confct-free
ntersecton. The fows 1 and 3 are consdered as C
1
and the fows 2
and 4 consttute C
2
. Fows n the same combnaton w aways have
the same ght ndcaton at the same tme. When one combnaton
has green or yeow ndcaton, another combnaton has red
ndcaton.
"igure 72'> T8o t8o48a# streets intersection1 "7C-1 8%ic%
ser!e 7 flo8s in - s#mmetric com,inations2
For the sake of smpcty, the probem s formuated n dscrete
tme. Tme s dvded nto sots. Ths tme unt 'slot) s taken to be
C
2
C
-
C
'
C
'
6
-
'
7
the tme a car needs to cross the ntersecton when the ght s
green or yeow. Ha|ema and Van der Wa |4/ assumed ths tme
unt as beng two seconds.
To avod nterference between antagonstc streams of consecutve
sots when swtchng from a green ndcaton for one combnaton to
a green for a dfferent combnaton, a switching time s necessary.
The swtchng tme s chosen to be fxed and t takes 3 sots; 2 sots
of yeow and 1 sot n whch a fows have a red ndcaton. As an
exampe, a cyce for the ght ndcaton for the fows s shown n
Fgure 4 .2. The fows 1 and 3 get a green ndcaton durng 3 sots
and the fows 2 and 4 get a green ndcaton durng 1 sot. The
swtchng tme takes 3 sots. Hence, the duraton of the cyce s 10
sots.
"igure 72-> 0xample of lig%ts indication diagram2
The arrvas n dfferent fows and n dfferent sots are ndependent.
It s reasonabe to assume that the number of car arrvas n one sot
s ether 0 or 1 per fow. We denote the arrival rate n one sot by f
q
for fow
f
.
In each fow that has rght of way (havng a green or a yeow
ndcaton), exacty one car can pass the stoppng ne n one sot. A car
that arrves at an empty queue that has rght of way passes the
stoppng ne wthout deay.
The state of the process s observed n each sot. The decson
epochs are as foows. New arrvas take pace at the begnnng of
the sot, whch s after the observaton of the state of the process.
Departures take pace at the end of the sot pror to the observaton
of the new state. Hence when a fow has rght of way and a car
1
2
3
4
so
ts
gree
n
yeow red
arrves n the certan sot, the state of the fow remans the same for
the next sot.
72- Marko! decision pro,lem formulation
As stated above, the probem s consdered as a dscrete-tme
stochastc contro probem. The Markov decson probem
formuaton conssts of the specfcaton of the states space, the
decson space (n each state there are severa actons from whch
the decson must be chosen), the transton probabtes, and the
cost functon.
72-2' States
The state of the system s represented by two vectors, one
represents the state of the traffc and the other represents the state
of the ghts. The state of the traffc s fuy descrbed by a vector
( )
4 3 2 1
, , , k k k k k
, wth f
k
the number of cars n fow
f
present at
the begnnng of a sot. Further, the state of the ght s descrbed by
vector
( ) i l x ,
wth
{ } 2 , 1 l
the combnaton whch s havng a green
( ) 0 i
, a frst yeow
( ) 1 i
, a second yeow
( ) 2 i
, or a red
( ) 3 i
ght. The state of the ght s fuy descrbed by
x
, because when
one combnaton of the fows has rght of way (havng a green or a
yeow ndcaton), the other fows a have a red ndcaton. Hence,
the states are denoted by the vector
( ) ( ) ) , ( ), , , , ( ,
4 3 2 1
i l k k k k x k
, n
tota a 6-dmensona vector.
72-2- &ecisions
In each state there are severa actons from whch decson must be
chosen. The decsons depend on the state of the traffc ghts but
aso on the engths of the queues. The possbe decsons n the
varous stuatons are descrbed as foows.
If a ghts are red, the possbe decsons are to keep a ghts
red, or to gve a green ndcaton to one of the combnatons.
If the ghts are green for one combnaton, there are two
possbe decsons: keep the ghts as they are or change from
green to frst yeow.
At the end of a frst yeow sot there s ony one decson:
contnue to the second yeow sot.
After the second yeow, the ony decson s to change nto red
for a fows.
Hence the decson space, denoted by A ( ) ( ) i l k , , ( s
A
( ) ( )
( ) ( ) { }
( )
( ) ( ) { }
'
3 , 0 , , 3 ,
2 , 1 , 1 ,
0 , 1 , , 0 ,
, ,
'
i if l l
i if i l
i if l l
i l k
,
374'-5
wth
'
l the next non-empty combnaton. Decsons are taken at the
begnnng of a sot and executed nstantaneousy. Thus, f a
combnaton has rght of way, cars of that combnaton can eave n
the very same sot.
72-26 Transition pro,a,ilities
Gven a state
( ) x k,
, the chosen acton
a
mpes an nstantaneous
change of ghts from state
x
nto state
a
, due to the fact that the
chosen acton s part of the state. Hence, the transton probabty
from state
( ) x k,
to state ( )
' '
, x k s 0 uness a x
'
. The transton
probabtes, denoted by ( ) a k x k p , ; ,
'
, are best descrbed by
consderng each fow separatey. Let
( )
'
, ,
f f f
k a k p
denote the
transton probabty for the number of cars n fow
f
when acton
a
s taken. Snce the fows are ndependent, the transton
probabtes are smpy the product of transton probabtes for
each fow.
( ) ( )
4
1
' '
, , , , ,
f
f f f f
k a k p a k x k p
.
374'65
If by acton
a
fow
f
has rght of way durng the comng sot, the
transton probabtes per fow are gven by
( ) ( ) 0 , , , ; 1 1 , , >
f f f f f f f f
k q k a k p q k a k p
( ) 1 0 , , 0 a p
f .
374'75
And f acton
a
mpes red for fow
f
, then
( ) ( ) 0 , 1 , , ; 1 , , + k q k a k p q k a k p
f f f f f f f f
.
374'95
The transton probabtes for the number of cars n fow
f
are
ustrated n Fgure 4 .3 and Fgure 4 .4.
"igure 726> Illustration of t%e transition pro,a,ilit# for t%e
num,er of cars in flo8 f 8%en flo8 f %as rig%t of 8a#2
"igure 727> Illustration of t%e transition pro,a,ilit# for t%e
num,er of cars in flo8 f 8%en t%e state of lig%t of flo8 f is
red2
72-27 Costs
The am s to mnmze the overa average watng tme per car.
Based on Littles Law, ths corresponds to mnmzng the average
number of cars watng at the queues. Therefore, a near cost
functon s consdered where one unt of costs s accounted for every
car present at the begnnng of a sot. Hence, the cost functon,
denoted by
( ) k c
, s gven by
( )
4
1 f
f
k k c
.
374':5
72-29 Counta,le state spaces
To easy compute the performance measures (and the optma
pocy), the state space has to be countabe. The state space can be
reduced to be a fnte one by mtng the number of cars that can be
present n each fow. The maxmum state N n each fow becomes
one of the parameters n the system. Thus, the arrvas to a queue
whch s fu w be re|ected. Ths stochastc contro probem
nvoves a fnte state space S of cardnaty 4 2 | |
4
N S , where
the state of the system s a 6-dmensona vector. The transton
probabtes shoud be changed wth respect to f
q
such that there
are no transtons from state N to 1 + N . The transton probabtes
per fow n the fnte state space are defned as foows. If durng the
( ' - 6
1-
q
f
1-
q
f
1-
q
f
1
q
f
q
f
q
f
1-
q
f
( ' - 6
q
f
1-
q
f
1-
q
f
1-q
f
1-
q
f
q
f
q
f
q
f
comng sot fow
f
has rght of way, the transton probabty s
defned as foows,
( ) ( ) N k q k a k p q k a k p
f f f f f f f f
< 0 , , , ; 1 1 , ,
,
( ) 1 0 , , 0 a p
f .
374'.5
If durng the comng sot fow
f
gets a red ndcaton, the transton
probabty n the case of countabe state spaces s defned as foow,
( ) ( ) N k q k a k p q k a k p
f f f f f f f f f
< + , 1 , , ; 1 , ,
,
( ) 1 , , N a N p
f .
374')5
9 T%e t8o4p%ase ALP approac% for "7C-
After defnng the mode, the ALP formuaton for ths probem can
be determned. The contro strategy used n ths paper s cycc. The
mpementaton of the approach s done n the foowng steps.
1. Fnd the optma average cost based on the frst phase of the
two-phase ALP, denoted by
1
g . The greedy poces
1
assocated
wth
1
g can be obtaned. The frst phase of the two-phase ALP s
done gven the predefned bass functons.
2. Evauate the poces
1
by smuatng the state of each fow for #
sots, say 50000, and determne the state of the ghts based on
the greedy poces
1
. Then, the average cost can be computed
by takng the average of the sum of the number of cars n a
fows, denoted by
* 1
g
. It s expected that
1 * 1
g g
because the
frst phase of the two-phase ALP does not necessary gve a good
approxmaton to the vaue functon whch eads to non-optma
poces
1
.
3. Improve the poces
1
by usng the second phase of the two-
phase ALP approach, gven the approxmaton of the optma
average cost
1
g , the predefned bass functons, and the state
reevance weghts. The new poces obtaned w be denoted by
2
. The average cost obtaned w be denoted by
* 2
g
.
4. Evauate the new poces
2
by smuaton as n step 2. It s
expected that
1 * 2 * 1
g g g
.
The steps are further dscussed n the next secton. The arrva rates
w be vared to compare the approach wth the other strateges.
The resuts obtaned are reported n Secton 5.2 .
92' T%e t8o4p%ase ALP formulation
To ensure unqueness, assume state V(0,(1,0)) acts as a reference
state by takng V(0,(1,0))=0. The frst-phase ALP s gven by
( ) ( ) ( ) ( ) . ), , ( , , , , , , . .
max
' '
,
a x k a k r a k x k p k c x k r g t s
g
y
r g
+ +
394'
;5
Denote the soutons by (
1 1
, r g ). The bass functons that are used
are
}, 4 , 3 , 2 , 1 { , ;
,
, 1
0
b a k k
k
b a ab
a a
394-
(5
where
b
k
s the number of cars n fow b. The vaue functon s
approxmated by a second order poynoma that ncudes terms that
correate dfferent fows wth each other, wth dfferent weghts for
each state of ght
( ) i l,
. Hence, the vaue functon s ( ) ( ) i l k V . ,
approxmated by ( ) ( ) i l k r . , , whch s
( ) ( )
.
, ,
4 3
) , (
34 4 2
) , (
24 3 2
) , (
23 4 1
) , (
14 3 1
) , (
13 2 1
) , (
12
2
4
) , (
44
2
1
) , (
11 4
) , (
4 1
) , (
1
) , (
0
k k r k k r k k r k k r k k r k k r
k r k r k r k r r i l k r
i l i l i l i l i l i l
i l i l i l i l i l
+ + + + +
+ + + + + + +
394-
'5
The constant term
) 0 , 1 (
0
r =0, snce ( ) ( ) 0 0 . 1 , 0 V s chosen. There are
n tota 158 = '-( varabes. In ths case, the number of varabes
does not change as the number of states grows. The near
programmng s soved by usng the Premium )olver Platform for
*xcel( whch s abe to sove near programmng probems wth a
number of varabes up to 8,000. Whe the ALP may nvove
manageabe number of varabes, the number of constrants s st
as many as the number of state-acton pars. Therefore, to be abe
to sove the ALP probem, the argest state on each fow has to be
mted to be abe to ncude a the state-acton constrants for
sovng the LP probem. The maxmum of the argest state on each
fow s 4 correspondng to 7,500 constrants based on the MDP
formuaton. The optma pocy
1
assocated wth
1
g can be
generated by takng
( )
( )
( ) ( )
'
'
, , ; , min arg ,
'
1
'
) , ( ,
1
k
i l k A a
a k r a k x k p x k
.
394--5
In order to check whether the frst phase of the two-phase ALP
yeds a good approxmaton for the vaue functon, smuaton of the
states w be done for 500,000 sots. The second phase of the two-
phase ALP s done by sovng near probem ( 5 -23).
( ) ( ) ( ) ( ) a x k a k r a k x k p k c x k r g t s
r c
y
T
r
), , ( , , , , , , . .
max
' '
2
+ +
394-
65
An obvous choce for
2
g s
1
g , the estmate for the optma average
cost obtaned from the frst phase ALP. The souton to the second
phase ALP s r
&
. It s amed to contro the accuracy of the
approxmaton to the cost functon over dfferent portons of the
state space. As descrbed n Theorem 1, maxmzng r c
T
s
equvaent to mnmzng
) (
2
r V c
g
T
1 whch s the sum the errors of
the approxmaton to the cost functon weghted by
c
for baancng
accuracy of the approxmaton over dfferent states. Based on
Theorem 2, the state reevance weghts can be chosen
correspondng to the statonary state dstrbuton. It may suffce to
use rough guesses about the statonary state dstrbuton n some
cases. Thus, the state reevance weghts
c
are chosen n the form
( ) ( )
( )
( )
( )
( )
( )
( )
( )
( )
'
+ +
+ +
, 2 , 1 1
, 1 , 1 1
, ,
3 1 4 2
4 2 3 1
3
2
3
2
1
3
2
3
2
1
l if
l if
i l k c
k k k k
i i a
k k k k
i i a
394-
75
where
( ) ( ). , ,
) , ( ,
i l k
i l k c a
394-
95
To make sure that the sum of state reevance weghts s 1, ( ) ( ) i l k c , ,
s mutped by
a
1
.
92- =esults and e!aluation
92-2' S#mmetric arri!al rates
In ths secton, the resuts for varyng arrva rates at a symmetrc
F4C2 ntersecton are presented n whch a fows have dentca
arrva rates per sot. The average cost
1
g resuted by the frst
phase ALP and the average cost
* 1
g
and
* 2
g
yeded by the
greedy pocy wth respect to
1
r and
2
r , respectvey, are gven
n the Tabe 5 -1 beow. The second phase ALP s done for the state
reevance weghts
c
wth
i
f
f
q c
.
Tabe 5 -5 presents the overa average watng tme n seconds for
varyng arrva rates at a symmetrc F4C2 ntersecton, whch means
that a fows have dentca arrva rates per sot of f
q
2
Ta,le 949> B!erall a!erage 8aiting time in seconds for t%e
s#mmetric "7C-2
Rue
f
q
0.2
f
q
0.3
f
q
0.4
Two-phase ALP 5.00 6.65 8.57
FC 5.37 7% 7.30 10% 8.92 4%
XHC 5.66 13% 7.54 13% 8.65 1%
XHC(1) 5.06 1% 6.69 1% 8.37 -2%
XHC(2) 5.16 3% 6.97 5% 8.99 5%
FC green perods
n sots for C1,
C2
1, 1 3, 3 8, 8
For ths smpe ntersecton, the overa average watng tme ganed
from the two-phase ALP approach s ess than the other strateges.
The resuts obtaned by the two-phase ALP approach s cose to the
antcpatng exhaustve XHC(1) strategy.
92-2- As#mmetric arri!al rates
In the case of asymmetrc arrva rates, two stuatons are
consdered for F4C2. The frst stuaton that s consdered s where
the arrva rates are 0.15 for fows 1 and 3, and 0.45 for fows 2 and
4. Hence, the fows wthn the same combnaton have the dentca
arrva rates, but C2 s three tmes as busy as C1. The second
stuaton s where the arrva rate for fow 1 s 0.10, and the arrva
rates for the other fows are 0.30. The coeffcents of the near
combnaton of the bass functons obtaned from the two-phase ALP
approach for approxmatng the vaue functon are gven beow.
Ta,le 94:> T%e 8eig%ts of t%e linear com,ination of t%e ,asis
functions for t%e as#mmetric "7C-1 8it%
q
A3(2'91 (2791
(2'91 (27952
Coefficie
nts
lA' lA-
iA( iA' iA- iA6 iA( iA' iA- iA6
r
0
0.00 0.44 0.00 0.00 0.00 1.25 2.97 0.00
r
1
2.25 1.22 9.01 8.32 7.61 5.97 4.28 2.25
r
2
6.30 6.13 4.61 0.91 0.91 3.10 6.76 6.44
r
3
2.64 1.07 9.67 8.70 8.29 6.59 4.81 2.64
r
-
6.19 5.66 4.03 0.91 0.91 3.07 6.65 6.30
r
++
0.62 1.76 0.00 0.00 0.00 0.19 0.39 0.62
r
&&
0.00 0.00 0.30 0.95 0.95 0.95 0.00 0.00
r
,,
0.68 1.97 0.00 0.00 0.00 0.21 0.43 0.68
r
--
0.00 0.10 0.42 0.95 0.95 0.94 0.00 0.00
r
+&
0.08 0.01 0.01 0.01 0.01 0.03 0.06 0.07
r
+,
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
r
+-
0.03 0.00 0.00 0.00 0.00 0.01 0.02 0.03
r
&,
0.12 0.17 0.21 0.25 0.26 0.04 0.09 0.11
r
&-
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
r
,-
0.22 0.18 0.21 0.26 0.27 0.07 0.16 0.20
Ta,le 94.> T%e 8eig%ts of t%e linear com,ination of t%e ,asis
functions for t%e as#mmetric "7C-1 8it%
q
A3(2'1 (261 (261
(2652
Coeffcen
ts
=1 =2
=0 =1 =2 =3 =0 =1 =2 =3
r
0
0.00 0.31 1.05 0.00 0.00 0.47 1.44 0.03
r
1
0.55 0.00 6.22 4.98 4.18 2.87 1.71 0.55
r
2
4.93 4.04 2.46 0.68 0.68 1.81 6.33 5.52
r
3
1.42 1.66 7.82 7.23 7.08 5.57 4.02 1.41
r
-
5.26 4.02 2.43 0.69 0.69 1.88 6.96 6.36
r
++
0.56 1.54 0.10 0.17 0.23 0.46 0.51 0.56
r
&&
0.10 0.36 0.59 0.79 0.79 1.13 0.00 0.00
r
,,
0.91 1.38 0.00 0.00 0.00 0.32 0.55 0.91
r
--
0.16 0.34 0.58 0.78 0.78 1.23 0.00 0.00
r
+&
0.00 0.01 0.15 0.16 0.16 0.00 0.00 0.00
r
+,
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
r
+-
0.02 0.05 0.27 0.29 0.29 0.00 0.00 0.00
r
&,
0.66 0.21 0.24 0.28 0.30 0.14 0.51 0.61
r
&-
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
r
,-
0.22 0.24 0.15 0.18 0.19 0.05 0.17 0.20
The resuts are gven n Tabe 5 -8. The overa mean watng tme s
denoted by
) (W
and the average watng tme for fow
f
s
denoted by
) (
f
W
. The ength of green perods n the FC contro
strategy for C1 and C2 are 1 and 5 sots, respectvey, n the frst
stuaton. In the second stuaton, the ength of the green perods s
3 sots for both combnatons.
Ta,le 94)> Mean 8aiting times in seconds for t8o
as#mmetric "7C- cases2
Rue
) (W ) (
1
W ) (
2
W ) (
3
W ) (
4
W
q
=(0.15, 0.45,
0.15, 0.45)
Two-phase ALP
5.95 8.47 5.16 8.41 5.07
FC 6.32 6% 10.53 4.91 10.55 4.91
XHC 6.72 13% 9.86 5.68 9.88 5.67
XHC(1) 6.37 7% 8.12 5.79 8.11 5.79
XHC(2) 7.32 23% 7.07 7.42 7.08 7.39
q
=(0.10, 0.30,
0.30, 0.30)
Two-phase ALP
6.22 4.93 5.43 8.25 5.40
FC 7.09 14% 5.19 7.30 7.31 7.30
XHC 6.96 12% 6.42 6.66 7.72 6.67
XHC(1) 6.22 0% 5.25 6.05 6.88 6.06
XHC(2) 6.61 6% 4.63 6.65 7.19 6.65
As observed, the two-phase ALP approach resuts a ower overa
mean watng tme compared to the other strateges. Further, t s
observed that the fows that are parts of the more busy combnaton
have a ower watng tme. Ths gves an dea that a more busy
combnaton gets the prorty. If a busy fow and a ess busy fow are
grouped together n one combnaton, then the ess busy fow w
take advantage of the prorty and the more busy fow w suffer a
bt.
926 =educed Linear Program
Athough the dmenson of the probem s reduced n the
approxmate verson (ALP), the number of constrants s st as
many as the number of state-acton pars whch can be very arge.
The two-phase ALP ony soves parts of the curse of dimensionality
probem. Therefore, the argest state on each fow s mted to 4. In
practce, ths s certany not the case. Van Roy |2/ suggested a
constrant sampng approach for approxmatng soutons to
optmzaton probems when the number of constrants s
ntractabe, the .educed Linear Program (RLP). The dea s to defne
a probabty dstrbuton 4 over the set of constrants and to ncude
ony a subset of the dstrbuted constrants for sovng the probem.
Two propertes were proven that f a reasonabe number of
constrants are samped from dstrbuted constrants, then amost a
others w be satsfed and the constrants that are not satsfed do
not dstort the souton too much.
The RLP for the traffc contro probem s characterzed as foows.
The argest state on each fow s set to 10. The constrant sampe
sze s 20,000. The subset of constrants were samped based on
probabty measure 4(k) =(1-p)
4
p
|k|
, p =0.99. In the case of sma
arrva rates, say 0.2, the RLP gves a good approxmate to the vaue
functon that resuts the average cost smar to average cost
yeded from the two-phase ALP. However, when the oad s hgh,
the resuts do not resembe the expectatons because the samped
constrants do not represent "amost a" other constrants, and the
constrants that are not satsfed dstort the souton effectvey. It s
proven that the resuts of RLP rey on an deazed choce of 4. The
underyng thought s smar as choosng the state reevance
weghts (see Secton 3.3 ). The constrants that represent the states
that are vsted more often shoud be then ncuded. The chosen
probabty measure 4(k) has an exponenta form whch gves hgh
probabtes to the states that are cose to the stuaton when there
are no cars present n a fows, .e., the states wth sma number of
cars present n the fows. Ths probabty measure s an acceptabe
choce of 4 when the arrva rates are sma. However, when the
arrva rates are hgher, the hgh probabty has to be assgned to
the states where more cars are present n the fows. Therefore,
exponenta form s not an deazed choce of the probabty
measure 4.
: Conclusion
The two-phase ALP approach for average cost dynamc
programmng s apped to the traffc contro at soated sgnazed
ntersecton. As an ustraton, an ntersecton of two two-way
streets wth controabe traffc ghts on each corner s consdered.
Based on the chosen bass functons and the state reevance
weghts, t s observed that the resuts based on the swtchng
schemes obtaned by the ALP approach were better compared to
the other strateges where there are at most 4 cars on each fow.
In order to dea wth the ntractabe state space, the ALP approach
provdes an agorthm for fndng a good approxmaton to the
optma vaue functon by fttng a parameterzed near functon.
Hence, the number of varabes that has to be stored n ALP
approach s equa to the number of the pre-specfed bass functons,
nstead of storng the optma vaue functon tsef for each state n
the system. However, the ALP approach st has as many
constrants as the exact near programmng formuaton. Therefore,
the argest state on each fow s mted to 4. In practce, ths s
certany not the case. Van Roy |2/ suggested a constrant sampng
approach for approxmatng soutons to optmzaton probems
when the number of constrants s ntractabe. The dea s to defne
a probabty dstrbuton 4 over the set of constrants and to ncude
ony a subset of the dstrbuted constrants for sovng the probem.
=eferences
|1| Faras, D.P. de, Ben|amn van Roy, "Approxmatng Lnear
Programmng for Average-Cost Dynamc Programmng".
|2| Faras, D.P. de, Ben|amn van Roy, "On Constrant sampng n
the Lnear Programmng Approach to Approxmate Dynamc
Programmng". /athematics of operations research( %ol. &0( 1o. ,(
August &22-( pp. -3&-&45.
|3| Faras, D.P. de, (|une 2002), "The Lnear Programmng Approach
to Approxmate Dynamc Programmng: theory and appcaton".
|4| Ha|ema, R, |an van der Wa (3
rd
November 2006), "An MDP
decomposton approach for traffc contro at soated sgnazed
ntersectons".
|5| Kooe, G (9
th
September 2005), "Lecture notes Stochastc
Optmzaton", Lecture notes VU Amsterdam.
|6| %ttp>CCen28ikipedia2orgC8ikiCTrafficDlig%tEIntroduction
|7| "Intersecton Safety: Myth Versus Reaty", Intersecton Safety
Bref, U.S. Department of Transportaton, Federa Hghway
Admnstraton.