Werkstuk Mak Tcm39 91392

Approximate Linear
Programming for Traffic

Control at Isolated
Signalized Intersections
Angelique Mak
Vrije ni!ersiteit
"acult# of Sciences
$usiness Mat%ematics and Informatics
&e $oelelaan '()'a
'()' *V Amsterdam
+o!em,er -((.
_____________________________________________________________________
- 1 -
Page 2
Preface
The BMI paper s one of the ast compusory sub|ects of the master
study Busness Mathematcs and Informatcs. The target of the BMI
paper s descrbng and anayzng a probem n the fed of BMI
based on exstng terature. Its focus embraces aspects of
economcs, mathematcs, and computer scence.
The budng bocks of ths paper are two papers. The frst paper |1/
descrbes an Approxmate Lnear Programmng approach for
average cost dynamc programmng. A traffc contro probem
formuated n the second paper |4/ s chosen to ustrate ths
approach. It focuses on the near approach to approxmate the
dynamc programmng vaue functon through experments wth
traffc contro at soated sgnazed ntersectons to fnd out how
traffc ght swtchng schemes for ths system can be determned
such that the number of cars n a fows s mnmzed n the ong
term.
I woud ke to thank my supervsor, Sand|a Bhua. He provded me
gudance and support, not ony durng my wrtng, but aso most of
the tme durng my master study BMI. Aso hs optmsm has
encouraged me and I am gratefu for that.
Further, I woud ke to thank Pau Harknk, Ch Hung Mok, and
Chrste N|man for ther technca support. Wthout ther generosty
and hep, fnshng ths thess woud have been much harder for me.
_____________________________________________________________________
- 2 -
Page 3
Management Summar# in 0nglis%
Intersectons are paces n the traffc network where many potenta
confcts occur. Traffc s much affected by the traffc ght contro. A
snge ntersecton of two two-way streets wth controabe traffc
ghts at each corner s consdered n ths paper. The man purpose
of ths paper s to appy the two-phase Approximate Linear
Programming approach for average cost dynamic programming
presented by De Faras and Van Roy |1/, to fnd out how traffc ght
swtchng schemes for ths system can be determned such that the
number of cars n a fows s mnmzed n the ong term. Cars arrve
at an ntersecton controed by a traffc ght and form a queue. The
dynamc contro of the traffc ghts s based on the numbers of cars
watng n the queues. The mode that descrbes the evouton of the
queue engths used n ths paper s formuated n the paper of
Ha|ema and Van der Wa |4/, whch s modeed as a Markov
Decson Process n dscrete tme. The set of a fows s parttoned
nto ds|onted combnatons of non-confctng fows that w receve
green together.
In prncpe, probems of ths type can be soved va dynamc
programmng. Dynamc programmng refers to a coecton of
agorthms that can be used to compute optma poces gven a
mode of the envronment, such as a Markov Decson Process.
Dynamc programmng computes the optma vaue functon by
sovng the Bemans equaton. The doman of the optma vaue
functon s the state space of the system to be controed. Ths
means that the number of varabes (vaue functon) to be stored s
equa to the sze of the state space. When the state space s arge
ths dynamc programmng computaton becomes ntractabe. It s
known as the curse of dmensonaty. Especay, when deang
wth a mut-dmensona state space, ts sze grows exponentay n
the number of state varabes. Ths s aso the case n traffc ght
contro, due the fact that each queue (ane) has actuay an nfnte
buffer.
Approxmate dynamc programmng ntends to aevate the curse of
dmensonaty by consderng the approxmaton to the vaue
functon, the scoring function, whch can be stored and computed
effcenty. One of the consderatons wthn Approxmate Dynamc
Programmng s choosng the approxmaton archtectures, the
structure of the approxmaton to the vaue functon. In ths paper,
the use of near archtectures s consdered. A coecton of
_____________________________________________________________________
- 3 -
Page 4
functons that maps the system state space to rea numbers (the
bass functons) s chosen and the scorng functon can be generated
by fndng an approprate near combnaton of these bass
functons. Hence, t suffces to store the weghts assgned to each of
the bass functons n the near combnaton nstead of storng the
vaue functon for each state n the system. The number of
varabes (one per bass functon) to be stored s tremendousy
smaer than the number used by the vaue functon wth one vaue
per state n the system.
A successfu use of approxmate dynamc programmng depends on
a good choce of the bass functons and a good choce of weghts
assgned to each of the bass functon n the near combnaton. The
dynamc programmng probem can be recast as a near
programmng probem. However, ths exact near programmng
approach aso suffers from the curse of dmensonaty. They have
as many varabes as the number of states n the systems and at
east the same number of constrants. Combnng the exact near
programmng approach wth the near approxmaton archtecture
eads to the Approximate Linear Programming agorthm (ALP).
Compared to the exact near program that stores the optma vaue
functon for each state n the system, the ALP has a much smaer
number of varabes snce that t has as many varabes as the
number of bass functons. There are two phases ncuded n the ALP
approach for average cost dynamc programmng. The frst phase
prortzes approxmaton of the optma average cost, but does not
necessary gve a good approxmaton to the vaue functon. The
second phase expcty approxmates the vaue functon, wth
presence of the so caed state reevance weghts that s used for
controng the quaty of the approxmaton to the optma vaue
functon.
Based on the pre-specfed bass functons and state reevance
weghts, t s observed that the ALP agorthm dd a good |ob n
approxmatng the dynamc programmng vaue functons. It
corresponds to the determnaton of the swtchng scheme for the
traffc ght contro such that the number of cars n a fows s
mnmzed n the ong term.
_____________________________________________________________________
- 4 -
Page 5
Management Summar# in &utc%
Kruspunten z|n paatsen n het verkeersnetwerk waar vee
potente confcten voorkomen. Het verkeer wordt sterk benvoed
door sturng van het verkeerscht. Dt versag behandet een
krusng van twee tweerchtngsstraten met controeerbare
verkeerschten op eke hoek van de straat. Het hoofddoe van dt
versag s om te weten hoe de omschakengsregengen van het
verkeerscht voor dt systeem kunnen worden bepaad, zodang dat
het aanta auto's n ae stromen op ange term|n wordt
gemnmaseerd. Hervoor wordt de voorgestede
benaderngsmethode door De Faras en Van Roy |1/, Two-phase
Approxmate Lnear Programmng for average cost dynamc
programmng toegepast. De auto's komen b| een krusng aan de
door een verkeerscht wordt gestuurd en vormen een r|. De
dynamsche controe van de verkeerschten s gebaseerd op het
aanta autos de n de r|en wachten. Het mode dat de evoute van
de r|engten beschr|ft, s gebaseerd op de formuerng van Ha|ema
en Van der Wa |4/, dat gemodeeerd s as een Markov Decson
Process n dscrete t|d. De stromen worden verdeed n ds|uncte
combnates van confctvr|e stromen de samen groen cht kr|gen.
In prncpe kan dt probeem worden opgeost va dynamsche
programmerng. De dynamsche programmerng verw|st naar een
verzameng van agortmen de gebrukt kunnen worden om een
optmaa beed te vnden, gegeven een Markov Decson Process
mode. De dynamsche programmerng berekent de optmae
waardefunctie door Bemans verge|kng op te ossen. Het domen
van de optmae waardefuncte s de toestandsruimte van het
systeem. Dt zou betekenen dat het aanta varabeen (de
opgesagen waardefuncte) ge|k s aan de grootte van de
toestandsrumte. De dynamsche programmerng s net meer
computatonee effcnt wanneer de toestandsrumte van het
systeem groot s. Dt probeem wordt meesta genoemd as de
curse of dimensionality. Voora, wanneer de toestandsrumte
mutdmensonaa s, groet z|n grootte exponentee n het aanta
varabeen. Dt s ook het geva n het verkeerschtprobeem, door
het fet dat eke stroom een onendge buffer heeft.
Approximate Dynamic Programming s bedoed om het probeem
van de curse of dmensonaty te vermnderen door het benaderen
van de waardefuncte, de effcnt kan worden opgesagen en
verkregen. En van de overwegngen bnnen Approxmate Dynamc
_____________________________________________________________________
- 5 -
Page 6
Programmng s het kezen van benaderngsstructuur, de structuur
van de benaderende waardefuncte. In dt versag wordt de neare
archtectuur gekozen. We kezen een verzameng van functes
(basisfuncties) de de mapping van toestandsrumte van het
systeem naar de ree getaen geeft. De benaderng van de
waardefuncte (scoring functie) kan worden verkregen door een
geschkte neare combnate van de bassfunctes te vnden. Het s
dus vodoende om de wegngsfactor van eke bassfuncte n de
neare combnate op te saan n paats van het opsaan van de
waardefuncte voor ek toestand n het systeem. Het aanta
varabeen s daardoor enorm kener ten opzchte van het aanta
waardefunctes met n waarde per toestand n het systeem.
Een succesvo gebruk van Approximate Dynamic Programming
hangt af van een goede keuze van de bassfunctes en een goede
keuze van de wegngsfactor van eke bassfuncte n de neare
combnate. Het dynamsche programmerng probeem kan as
neare programmerng (LP ) probeem worden herschreven. Maar
deze neare programmerng benaderng |dt ook onder de
zogenaamde curse of dmensonaty; ze hebben evenvee
varabeen as het aanta toestanden n het systeem en mnstens
hetzefde aanta restrctes. Het combneren van de LP met de
neare benaderngsarchtectuur edt tot Approximate Linear
Programming (ALP). De ALP heeft een vee kener aanta varabeen
dan de LP omdat er zo vee varabeen z|n as het aanta gekozen
bassfunctes. Er z|n twee fasen n de benaderng van de ALP voor
gemddede kosten dynamsche programmerng. De eerste fase s
gercht op het benaderen van de optmae gemddede kosten, maar
het geeft net at|d een goede benaderng voor de waardefuncte.
De tweede fase benadert specfek de waardefuncte.
Gebaseerd op de gekozen bassfunctes en state relevance weights,
de ALP agortme doet het goed n het benaderen van de
waardefunctes. Dt zorgt voor het bepaen van de
omschakengsregengen voor de verkeerscontroe probeem
dusdang dat het aanta autos n ae stromen op ange term|n
gemnmaseerd wordt.
_____________________________________________________________________
- 6 -
Page 7
Contents
Preface.............................................................................................2
Management Summary n Engsh...................................................3
Management Summary n Dutch.....................................................5
Contents...........................................................................................7
1 Introducton ...................................................................................1
2 Approxmate Dynamc Programmng.............................................4
2.1 Markov Decson Processes.......................................................4
2.2 ADP wth a near approxmaton archtecture...........................6
3 Approxmate Lnear Programmng for average costs.....................7
3.1 Frst phase of the average-cost ALP..........................................7
3.2 Second phase of the average-cost ALP.....................................8
3.3 State Reevance Weghts.........................................................10
4 Traffc ght contro ......................................................................11
4.1 Basc notaton and the modeng assumptons.......................11
4.2 Markov decson probem formuaton.....................................13
5 The two-phase ALP approach for F4C2 .......................................17
5.1 The two-phase ALP formuaton...............................................17
5.2 Resuts and evauaton............................................................19
5.3 Reduced Lnear Program.........................................................24
6 Concuson....................................................................................26
References.....................................................................................27
_____________________________________________________________________
- 7 -
Page 1
' Introduction
As the number of road users and the need of transportaton
ncreases, ctes around the word face serous road traffc
congeston probems. Traffc |ams have become the everyday fes
rtua for most of the peope n the word. Traffc |ams do not ony
cause tremendous costs due to unproductve tme osses; they aso
cause the ncreasng probabty of accdents and have a negatve
mpact on the envronment (congeston wastes fue and ncreases
ar pouton due to ncreased dng, acceeraton, and brakng) and
on the quaty of fe (stress and frustraton). Ths eads to the
queston about the possbty of controng traffc fows n order to
reduce the traffc |ams.
Intersectons are paces n the traffc network where many confcts
can occur potentay. These confcts exst because an ntersecton
s a road area where mutpe traffc fows meet or cross. Reducng
confcts can be accompshed through a combnaton of efforts,
ncudng the carefu use of the road nfrastructure, comprehensve
traffc safety aws and reguatons, sustaned educaton of drvers,
the wngness among drvers to obey the traffc safety aws, and
traffc management. The traffc management around the
ntersecton area s done by the traffc ght contro.
In most countres, three-state traffc ght s used |6/. The sequence
s red, green and yeow whch means stop, go and prepare to stop,
respectvey. The most contro strateges found n practce are
cycc; the order n whch the groups of fows are served s fxed.
There s a range of severa ogca poces by whch the traffc ght
can be controed. The most basc pocy can be cassfed accordng
to the foowng characterstcs |4/:
ixed-time (FC) contro. In ths form of operaton, not ony the order
s fxed but the red, yeow, and green ght ndcatons are tmed at
fxed ntervas. Fxed cyce controers are best suted for
ntersectons where traffc voumes are predctabe, stabe, and
fary constant.
Unke the FC, ntersectons wth traffic-responsive contro consst of
actuated traffc controers and vehce detectors paced on the anes
approachng the ntersecton. Ths form of contro makes use of rea-
tme measurements. The ength of the green nterva can be
engthened or shortened based on the present voume of the traffc.
At a hectc ntersecton, the green nterva woud be engthened, or
one gets a green perod on nqury.
Under exhaustive (XH) contro1 the green sgnas w be kept unt
a fows that have rght of way are exhausted (empty). The cycc
varant of exhaustve contro s abbrevated by XHC. The aternatve
form of exhaustve contro s anticipative exhaustive contro XHC(1)
and XHC(2), whch antcpates departures durng 1 and 2 yeow
sots, respectvey. In other words, the green perods w be kept
unt the number of cars at each fow n the combnaton that has
rght of way s at most one and two n XHC(1) and XHC(2),
respectvey.
!solated control s appcabe to snge ntersectons. The sgnas are
operated wthout consderaton of any ad|acent sgnas. In such a
case, each ntersecton w have a sgna contro that s most
approprate for that snge ntersecton. On the contrary,
coordinated control consders an urban zone or even a whoe
network comprsng many ntersectons.
However, the annua to of accdents due to motor vehce crashes
has not substantay changed n more than 25 years despte
mproved ntersecton nfrastructures and more sophstcated
appcaton of traffc engneerng measures. As mentoned n |7/1
nstang sgnas do not aways make ntersectons safer. The
nstaaton of sgnas that operate mpropery can create stuatons
where overa ntersecton congeston s ncreased, whch n turn can
create aggressve drvng behavor. Drvers tend to become
mpatent and voate red ghts when the traffc ght contro causes
onger watng tmes at ntersectons. Ths sub|ects oca resdents to
a greater rsk of cosons, worse congestons and more ar and
nose pouton. Hence, approprate traffc contro decsons are
requred.
Ths paper focuses on the dynamc contro at a sgnazed snge
ntersecton of two two-way streets wth soated contro. Ths
contro probem can be formuated as a Markov Decson Process
(MDP) for a stochastc dynamca system wth the average cost
crteron. The state of system evoves under uncertanty and a
sequence of decsons has to be made. Based on the rea-tme
stuaton, .e., the number of cars watng n the queues, a decson
has to be made as to whch set of fows has rght of way and these
decsons have a ong term effect. The current acton determnes a
new confguraton that determnes whch confguratons may be
reached n the future. A souton to a Markov decson probem s a
policy (a mappng from a state to an acton) that determnes state
transtons to mnmze the average cost, whch s the ong-run
number of cars n a fows. To be abe to derve the optma poces,
a functon defned on the state space caed the vaue functon s
requred to be computed and stored. For most probems of practca
nterest, the state space s extremey arge so that computng and
storng the optma vaue functon requre a ot of tme. Ths arge
space makes the dynamc programmng computatonay
ntractabe.
Because of the computatona compexty of sovng the dynamc
program, much effort has been put nto fndng aternatve earnng
agorthms. For nstance, Ha|ema and Van der Wa |4/ presented an
approach to smoothen the traffc fow that starts from a (neary)
optma fxed cyce strategy and executes one pocy mprovement
step that eads to a dynamc contro strategy.
Another approach s Approxmate Dynamc Programmng (ADP). Ths
approach s about fndng a good approxmaton to the vaue
functon. Ths paper appes the near programmng approach to
approxmate dynamc programmng proposed by De Faras and Van
Roy |1/ for sovng the traffc contro probem that s formuated as a
dscrete tme MDP. Ths approach consders the average cost
crteron and a verson of the approxmate near program that
generates approxmatons to the optma average cost and vaue
functon.
- Approximate &#namic Programming
Markov decson processes (MDPs) provde a mathematca
framework for modeng decson makng n stuatons under
uncertanty. Gven a mode of the envronment as a Markov decson
process, dynamc programmng can be used to compute the optma
poces. Ths chapter gves a descrpton of how dynamc
programmng offers a souton to the probem of mnmzng the
average cost over a fnte horzon. Moreover, we dscuss how the
curse of dmensonaty affects the dynamc programmng agorthm.
Further, the man deas n approxmate dynamc programmng w
be presented.
-2' Marko! &ecision Processes
Dynamc programmng offers a souton to probems nvovng
sequenta decson makng n systems wth non-near, stochastc
dynamcs. Systems n ths settng are descrbed by a set of varabes
evovng over tme - the state variables. The state varabes take
vaues n the state space of the system, whch s the set of a
possbe states the system can be n. The man dea n dynamc
programmng s that an optma decson can be derved based on
the score assgned to each of the states n the system - the value
function. The optma vaue functon obtaned from dynamc
programmng captures the advantage of beng n a gven state
reatve to beng n a other states.
Consder dscrete-tme stochastc contro probems nvovng a fnte
state space S of cardnaty
N S | |
. For each state S x , there are
possbe actons that can be chosen A
x
. In a gven state
x
when
acton a s taken, a cost of
) , ( a x c
s ncurred. The transton
probabtes
( ) y a x p , ,
, for each state par
( ) y x,
and acton
a
A
x
,
represent the probabty that gven acton
a
whe beng n state
x
,
the next state w be
y
.
A policy s a mappng from a state to an acton. Under pocy
u
, the
system foows a Markov process wth transton probabtes
( ) y x p
u
,
.
The souton to a Markov Decson Process can be expressed as a
pocy
u
, whch gves the acton to take for a gven state, regardess
of the pror hstory. Let
t
x
denote the random varabe for the state
that the system s n at tme t and
) (
t u
x c
denotes the correspondng
cost when pocy u s taken. Then, t s we known that there exsts a
pocy
u
such that
1
]
1
x x x c E
T
T
t
t u 0
1
0
| ) (
1
, as " goes to nfnty, s
mnmzed smutaneousy for a states and the am s to dentfy
that pocy.
Here, the Markov process s assumed to be rreducbe; for each par
of state
( ) y x,
and each pocy
u
, there s a t such that ( ) 0 , > y x P
t
u
. In
other words, t s possbe to get to any state from any state. Ths
mpes that, for each pocy
u
, the mt
1
]
1

x x x c E
T
T
t
t u
T
0
1
0
| ) (
1
lim
exsts and the average cost s ndependent of the nta state n the
system.
Denote the optma average cost by u
u
g g min
*
. The optma pocy

wth the average cost crteron can be derved from the souton of
#ellmans e$uation1
( ) ( ) ( ) ( )
'
+ +

y
A a
y V y a x p a x c x V g
x
, , , min
.
3-4'5
where
( ) V
denote vaue functon. The nterpretaton of
( ) x V
s the
dfference n accrued cost when startng the process n state x
reatve to a reference state. Bemans equaton can be formuated
n terms of matrces as foows.
V P c V e g
u u
* *
. + +
,
3-4-5
where
e
s a vector wth 1 as entres, *
u
c
and *
u
P
are vectors of the
costs and the transton probabtes based on the optma pocy,
respectvey.
Denote the souton of Bemans equaton by pars ( )
* *
,V g . An
aternatve method for dervng % s so caed pocy teraton. Ths
agorthm starts wth consderng a pocy n. The correspondng
( ) V g,
can be obtaned by sovng the Bemans equaton:
( ) ( ) ( )
+
y
y V y x x p x x c x ), ( , ) ( , ) (
.
3-465
To mprove the pocy n, take
( ) ( ) ( )
'
y
A a
y V y a x p a x c x , , , min arg ) (
'
3-475
n each state. The correspondng ( )
' '
,V g can be obtaned by agan
sovng the Bemans equaton. The mprovement can be agan
obtaned by sovng ( 2 -45 based on
'
V . Ths teraton w be
contnued unt the mnmum s attaned for each state. Note that
the vaue functon for every state has to be stored n memory n
every step. Therefore, the appcabty of dynamc programmng s
severey mted. The doman of the optma vaue functon s the
state space of the system to be controed. Ths means that the
number of varabes (vaue functon) to be stored and computed s
equa to the sze of the state space. When the state space s arge
ths dynamc programmng method becomes computatonay
ntractabe. Especay when deang wth mut-dmensona state
space, ts sze grows exponentay n the number of state varabes.
Ths probem s caed curse of dmensonaty.
-2- A&P 8it% a linear approximation arc%itecture
To aevate the curse of dmensonaty, the probem s soved by
fndng an approxmaton to the vaue functon,

K
S V :
~
, caed
the scoring function. The underyng assumpton s that the vaue
functon has some structure such that a reasonabe approxmaton
exsts.
By usng the near approxmaton archtecture, the scorng functon
s generated wthn a parameterzed cass of functons. It maps the
system state space to the set of rea numbers. Consder a gven set
of basis functions
K i S
i
, , 1 , :
, the scorng functons are
represented as near combnatons of the bass functons:
( ) ( )

K
i
i i
r x r V
1
~
. ,
3-495
Imagne that the pre-seected bass functons are stored as coumns
of matrx
K S
, and each row corresponds to the bass functons
evauated at a dfferent state
x
.
1
1
1
]
1

| |
| |
1 K

.
3-4:5
Now the optmzaton probem s formuated and anayzed as an
optmzaton probem for computng the weghts
K
i
r . Hence, t
suffces to store the weghts assgned to each of the bass functons
n the near combnaton nstead of storng the vaue functon for
each state n the system. The number of varabes (one per bass
functon) to be stored s tremendousy smaer than the number
compared to the vaue functon wth one vaue per state n the
system.
6 Approximate Linear Programming for
a!erage costs
A successfu use of approxmate dynamc programmng depends on
a good choce of the bass functons and a good choce of weghts
assgned to each of the bass functons n the near combnaton. A
study to the optma seecton of bass functons s out of the scope
of ths paper. Therefore, we assume that the set of bass functons s
pre-specfed, and that the focus s on fndng an approprate
parameter vector
K
r , gven a pre-seected set of bass functons.
It s known that the dynamc programmng probem can be recast as
a near programmng probem. However, ths exact near
programmng approach aso suffers from the curse of
dmensonaty. They have as many varabes as the number of
states n the system and at east the same number of constrants.
Combnng the exact near programmng approach wth the near
approxmaton archtecture eads to the approximate linear
programming agorthm (ALP). Compared to the exact near
program that stores the optma vaue functon for each state n the
system, the ALP has a much smaer number of varabes snce t has
as many varabes as the number of bass functons. In the next
sectons, the two-phase ALP approach for average costs s
descrbed. The frst phase of the average-cost ALP prortzes
approxmaton of the optma average cost, but does not necessary
gve a good approxmaton to the vaue functon. The second phase
expcty approxmates the vaue functon.
62' "irst p%ase of t%e a!erage4cost ALP
Reca the Bemans equaton (see Equaton 3 - 4'5). It can be
soved by the average cost Exact Lnear Programmng (ELP):
( ) ( ) ( ) ( ) . , , , , min . .
max
,
x y V y a x p a x c x V g t s
g
y
A a
V g
x

'
+ +

364.5
The probem s transated n a maxmzaton of the average cost that
woud be sub|ect to nequates of the form "" whch corresponds
to upper bounds. Note that the constrants are non-near, each
constrant nvoves a mnmzaton over the possbe actons. But
each constrant can be decomposed nto |A
x
| constrants. Therefore,
probem ( 3 -7) can be seen as a Lnear Programmng descrbed by
( 3 -8).
( ) ( ) ( ) ( ) . , , , , , . .
max
,
a x y V y a x p a x c x V g t s
g
y
V g
+ +

364)5
Ths resuts n a tota of | | S |A
x
|+1 constrants, whch s
unmanageabe f the state space s arge. The combnaton of the
exact near programmng and the near approxmaton archtecture
eads to the first phase ALP descrbed by ( 3 -9).
( ) ( ) ( ) ( ) . , , , , , . .
max
,
a x y r y a x p a x c x r g t s
g
y
r g
+ +

364;5
Denote the souton of the frst phase ALP by( )
1 1
, r g . Note that the
maxmzaton probem n ( 3 -9) s equvaent to mnmzng | |
1
*
g g .
Snce the frst phase ALP corresponds to the exact LP ( 3 -7) wth the
extra constrant r V , the souton to the frst phase ALP s mted,
*
1
g g for a feasbe
1
g . Ths mpes that the frst phase ALP can be
seen as an agorthm to approxmate the optma average cost.
Compared wth the ELP that stores the optma vaue functon for
each state n the system, the ALP has a much smaer number of
varabes snce that t has as many varabes as the number of bass
functons pus one. However, the ALP has st as many constrants
as the number of state-acton pars.
62- Second p%ase of t%e a!erage4cost ALP
It turns out, from an exampe gven n the paper |1/, that even
though the frst phase ALP produces a good approxmaton to the
optma average cost, t can produce arbtrary bad poces. The
man probem s that the agorthm of the frst phase ALP has prorty
to approxmate the optma average cost, but t does not necessary
yed a good approxmaton to the optma vaue functon. Hence, a
two-phase average-cost ALP s proposed n whch the frst phase s
smpy the frst phase of the average-cost ALP ntroduced n Secton
3.1. In the frst phase, the approxmaton to the optma average
cost s generated, whe n the second phase the focus s on the
approxmaton to the optma vaue functon.
The second phase of the ALP s formuated as foows.
( ) ( ) ( ) ( ) . , 0 , , , , . .
max
2
a x y r y a x p a x c x r g t s
r c
y
T
r
+ +
364'(5
The parameters that have to be pre-specfed are the state
reevance weghts
c
>0 and
2
g . Denote the optma souton of the
second phase of the average-cost ALP by r
&
. In De Faras and Van
der Roy <'/, a emma and some theorems were descrbed and used
to understand how the state reevance weghts
c
and the estmated
optma average cost
2
g n the second phase of the ALP can be used
for controng the quaty of the approxmaton to the optma vaue
functon.
In the foowng theorem, the nterpretaton of second phase ALP as
the mnmzaton of a certan weghted norm of the approxmaton
error s gven, wth weghts equa to the state reevance weghts.
T%eorem ' 3&e "arias and Van =o# <'/5>
Let r
-
,e t%e optimal solution to t%e t8o4p%ase ALP2 It
minimizes
c
g
r V
, 1
2

o!er t%e feasi,le region of t%e t8o4
p%ase ALP2
Proof> T%e norm
c , 1
.
is defined ,#
S x
c
x V x c V | ) ( | ) (
, 1 2
Maximizing r c
T
is equi!alent to minimizing
) (
2
r V c
g
T

2 It is
8ell kno8n t%at for all V1
V e g c P I

) ( ) (
2
1
* *

1 8e %a!e
2
g
V V
2 *ence1 an# r t%at is a feasi,le solution to t8o4p%ase ALP
pro,lems satisfies
2
g
V r
2 It follo8s t%at
, | ) ( ) ( | ) (
2 2 2
, 1
r c V c x r x V x c r V
T
g
T
S x
g
c
g

and maximizing r c
T
is t%erefore equi!alent to minimizing
c
g
r V
, 1
2

2
Hence, any fxed choce of
2
g , that satsfes
*
2
g g , there s bound
( ) ( ) e P I c g g r V r V
T
c
g
c
1
2
*
, 1
2
, 1
2
*
*
2
.
364''5
The two-phase ALP mnmzes the upper bound on the norm
c
r V
, 1
2
*

of the error n the approxmated vaue functon. The
state reevance weght
c
determnes how errors over dfferent
regons of the state space are weghted when approxmatng the
optma vaue functons, and can be used for specfyng the trade-off
n the quaty of the approxmaton across dfferent states.
Therefore, to generate a better approxmaton n a regon of the
state space one can assgn reatvey arger weghts to that regon.
To have some cue on how to choose the approprate state
reevance weghts c, performance bounds w be provded n the
next secton.
626 State =ele!ance ?eig%ts
A bound on the performance of greedy poces assocated wth
approxmate vaue functons were presented n De Faras and Van
Roy <'/ that provdes some gudance on choosng approprate state
reevance weghts. The bound s descrbed n Theorem 2.
T%eorem - 3&e "arias and Van =o# <'/5>
"or all V1 let
V
g
and
V
denote t%e a!erage cost and t%e

stationar# state distri,ution of t%e greed# polic# associated
8it% V2 T%en1 for all V suc% t%at VV
@
1
.
, 1
* *
V
V V g g
V

+
Proof>
+ote t%at t%e a!erage cost associated 8it% V is gi!en ,#
V
T
V V
c g and
T
V V
T
V
P is !alid for t%e stationar# state
distri,ution2
V
c
and
V
P
denote t%e costs associated 8it% t%e
greed# polic# 8it% respect to V2
V
g
can ,e formulated as
) ( V V P c c g
V V
T
V V
T
V V
+ 2 +o8 if VV
@
1 t%en
V
V V e g V V e g
V V e g
V V P c V V P c
T
V
T
V
V V
T
V V V
T
V

, 1
* * * *
* *
*
. ) ( .
) . (
) ( ) (
* *
+ +
+
+ +
The performance bound descrbed n Theorem 2 gves an aternatve
for seectng state reevance weghts. One approach s to seect the
state reevance weght correspondng to the statonary state
dstrbuton assocated wth the greedy pocy. It seems ogca, snce
the am s to have a good approxmaton to the vaue functon,
mportanty, thus the states that are vsted more often need to be
approxmated better. One dffcuty wth obtanng the statonary
state dstrbuton s that one shoud know the optma pocy
beforehand and the probem s fndng the optma pocy yet. It
suggests an teratve scheme usng n each teraton the weghts
correspondng to the statonary state dstrbuton assocated wth
the pocy generated by the prevous teraton.
7 Traffic lig%t control
The am of ths paper s to sove the traffc ght contro probem wth
the ALP approach descrbed n Chapter 3 . In ths chapter, the basc
noton and the modeng assumptons w be ntroduced.
Furthermore, the probem w be formuated as a Markov Decson
Probem.
72' $asic notation and t%e modelling assumptions
Consder a smpe ntersecton of two two-way streets, F4C2, whch
s ustrated n Fgure 4 .1. The fows are numbered cockwse. Cars
that arrve at one of the anes go ether straght crossng the
ntersecton or make a eft turn. The set of 4 fows s parttoned nto
2 ds|ont subsets, C
1
and C
2
. A subset of fows s caed a
combination. Two compatbe fows can safey cross the ntersecton
smutaneousy, ese they are caed antagonistic. The combnatons
are fxed, and they are chosen such that there s a confct-free
ntersecton. The fows 1 and 3 are consdered as C
1
and the fows 2
and 4 consttute C
2
. Fows n the same combnaton w aways have
the same ght ndcaton at the same tme. When one combnaton
has green or yeow ndcaton, another combnaton has red
ndcaton.
"igure 72'> T8o t8o48a# streets intersection1 "7C-1 8%ic%
ser!e 7 flo8s in - s#mmetric com,inations2
For the sake of smpcty, the probem s formuated n dscrete
tme. Tme s dvded nto sots. Ths tme unt 'slot) s taken to be
C
2
C
-
C
'
C
'
6
-
'
7
the tme a car needs to cross the ntersecton when the ght s
green or yeow. Ha|ema and Van der Wa |4/ assumed ths tme
unt as beng two seconds.
To avod nterference between antagonstc streams of consecutve
sots when swtchng from a green ndcaton for one combnaton to
a green for a dfferent combnaton, a switching time s necessary.
The swtchng tme s chosen to be fxed and t takes 3 sots; 2 sots
of yeow and 1 sot n whch a fows have a red ndcaton. As an
exampe, a cyce for the ght ndcaton for the fows s shown n
Fgure 4 .2. The fows 1 and 3 get a green ndcaton durng 3 sots
and the fows 2 and 4 get a green ndcaton durng 1 sot. The
swtchng tme takes 3 sots. Hence, the duraton of the cyce s 10
sots.
"igure 72-> 0xample of lig%ts indication diagram2
The arrvas n dfferent fows and n dfferent sots are ndependent.
It s reasonabe to assume that the number of car arrvas n one sot
s ether 0 or 1 per fow. We denote the arrival rate n one sot by f
q

for fow
f
.
In each fow that has rght of way (havng a green or a yeow
ndcaton), exacty one car can pass the stoppng ne n one sot. A car
that arrves at an empty queue that has rght of way passes the
stoppng ne wthout deay.
The state of the process s observed n each sot. The decson
epochs are as foows. New arrvas take pace at the begnnng of
the sot, whch s after the observaton of the state of the process.
Departures take pace at the end of the sot pror to the observaton
of the new state. Hence when a fow has rght of way and a car
1
2
3
4
so
ts
gree
n
yeow red
arrves n the certan sot, the state of the fow remans the same for
the next sot.
72- Marko! decision pro,lem formulation
As stated above, the probem s consdered as a dscrete-tme
stochastc contro probem. The Markov decson probem
formuaton conssts of the specfcaton of the states space, the
decson space (n each state there are severa actons from whch
the decson must be chosen), the transton probabtes, and the
cost functon.
72-2' States
The state of the system s represented by two vectors, one
represents the state of the traffc and the other represents the state
of the ghts. The state of the traffc s fuy descrbed by a vector
( )
4 3 2 1
, , , k k k k k
, wth f
k
the number of cars n fow
f
present at
the begnnng of a sot. Further, the state of the ght s descrbed by
vector
( ) i l x ,
wth
{ } 2 , 1 l
the combnaton whch s havng a green
( ) 0 i
, a frst yeow
( ) 1 i
, a second yeow
( ) 2 i
, or a red
( ) 3 i

ght. The state of the ght s fuy descrbed by
x
, because when
one combnaton of the fows has rght of way (havng a green or a
yeow ndcaton), the other fows a have a red ndcaton. Hence,
the states are denoted by the vector
( ) ( ) ) , ( ), , , , ( ,
4 3 2 1
i l k k k k x k
, n
tota a 6-dmensona vector.
72-2- &ecisions
In each state there are severa actons from whch decson must be
chosen. The decsons depend on the state of the traffc ghts but
aso on the engths of the queues. The possbe decsons n the
varous stuatons are descrbed as foows.
If a ghts are red, the possbe decsons are to keep a ghts
red, or to gve a green ndcaton to one of the combnatons.
If the ghts are green for one combnaton, there are two
possbe decsons: keep the ghts as they are or change from
green to frst yeow.
At the end of a frst yeow sot there s ony one decson:
contnue to the second yeow sot.
After the second yeow, the ony decson s to change nto red
for a fows.
Hence the decson space, denoted by A ( ) ( ) i l k , , ( s
A
( ) ( )
( ) ( ) { }
( )
( ) ( ) { }
'
3 , 0 , , 3 ,
2 , 1 , 1 ,
0 , 1 , , 0 ,
, ,
'
i if l l
i if i l
i if l l
i l k
,
374'-5
wth
'
l the next non-empty combnaton. Decsons are taken at the
begnnng of a sot and executed nstantaneousy. Thus, f a
combnaton has rght of way, cars of that combnaton can eave n
the very same sot.
72-26 Transition pro,a,ilities
Gven a state
( ) x k,
, the chosen acton
a
mpes an nstantaneous
change of ghts from state
x
nto state
a
, due to the fact that the
chosen acton s part of the state. Hence, the transton probabty
from state
( ) x k,
to state ( )
' '
, x k s 0 uness a x
'
. The transton
probabtes, denoted by ( ) a k x k p , ; ,
'
, are best descrbed by
consderng each fow separatey. Let
( )
'
, ,
f f f
k a k p
denote the
transton probabty for the number of cars n fow
f
when acton
a
s taken. Snce the fows are ndependent, the transton
probabtes are smpy the product of transton probabtes for
each fow.
( ) ( )
4
1
' '
, , , , ,
f
f f f f
k a k p a k x k p
.
374'65
If by acton
a
fow
f
has rght of way durng the comng sot, the
transton probabtes per fow are gven by
( ) ( ) 0 , , , ; 1 1 , , >
f f f f f f f f
k q k a k p q k a k p
( ) 1 0 , , 0 a p
f .
374'75
And f acton
a
mpes red for fow
f
, then
( ) ( ) 0 , 1 , , ; 1 , , + k q k a k p q k a k p
f f f f f f f f
.
374'95
The transton probabtes for the number of cars n fow
f
are
ustrated n Fgure 4 .3 and Fgure 4 .4.
"igure 726> Illustration of t%e transition pro,a,ilit# for t%e
num,er of cars in flo8 f 8%en flo8 f %as rig%t of 8a#2
"igure 727> Illustration of t%e transition pro,a,ilit# for t%e
num,er of cars in flo8 f 8%en t%e state of lig%t of flo8 f is
red2
72-27 Costs
The am s to mnmze the overa average watng tme per car.
Based on Littles Law, ths corresponds to mnmzng the average
number of cars watng at the queues. Therefore, a near cost
functon s consdered where one unt of costs s accounted for every
car present at the begnnng of a sot. Hence, the cost functon,
denoted by
( ) k c
, s gven by
( )

4
1 f
f
k k c
.
374':5
72-29 Counta,le state spaces
To easy compute the performance measures (and the optma
pocy), the state space has to be countabe. The state space can be
reduced to be a fnte one by mtng the number of cars that can be
present n each fow. The maxmum state N n each fow becomes
one of the parameters n the system. Thus, the arrvas to a queue
whch s fu w be re|ected. Ths stochastc contro probem
nvoves a fnte state space S of cardnaty 4 2 | |
4
N S , where
the state of the system s a 6-dmensona vector. The transton
probabtes shoud be changed wth respect to f
q
such that there
are no transtons from state N to 1 + N . The transton probabtes
per fow n the fnte state space are defned as foows. If durng the
( ' - 6
1-
q
f
1-
q
f
1-
q
f

1
q
f
q
f
q
f
1-
q
f
( ' - 6
q
f
1-
q
f
1-
q
f
1-q
f
1-
q
f
q
f
q
f
q
f
comng sot fow
f
has rght of way, the transton probabty s
defned as foows,
( ) ( ) N k q k a k p q k a k p
f f f f f f f f
< 0 , , , ; 1 1 , ,
,
( ) 1 0 , , 0 a p
f .
374'.5
If durng the comng sot fow
f
gets a red ndcaton, the transton
probabty n the case of countabe state spaces s defned as foow,
( ) ( ) N k q k a k p q k a k p
f f f f f f f f f
< + , 1 , , ; 1 , ,
,
( ) 1 , , N a N p
f .
374')5
9 T%e t8o4p%ase ALP approac% for "7C-
After defnng the mode, the ALP formuaton for ths probem can
be determned. The contro strategy used n ths paper s cycc. The
mpementaton of the approach s done n the foowng steps.
1. Fnd the optma average cost based on the frst phase of the
two-phase ALP, denoted by
1
g . The greedy poces
1
assocated
wth
1
g can be obtaned. The frst phase of the two-phase ALP s
done gven the predefned bass functons.
2. Evauate the poces
1
by smuatng the state of each fow for #
sots, say 50000, and determne the state of the ghts based on
the greedy poces
1
. Then, the average cost can be computed
by takng the average of the sum of the number of cars n a
fows, denoted by
* 1
g
. It s expected that
1 * 1
g g
because the
frst phase of the two-phase ALP does not necessary gve a good
approxmaton to the vaue functon whch eads to non-optma
poces
1
.
3. Improve the poces
1
by usng the second phase of the two-
phase ALP approach, gven the approxmaton of the optma
average cost
1
g , the predefned bass functons, and the state
reevance weghts. The new poces obtaned w be denoted by
2
. The average cost obtaned w be denoted by
* 2
g
.
4. Evauate the new poces
2
by smuaton as n step 2. It s
expected that
1 * 2 * 1
g g g
.
The steps are further dscussed n the next secton. The arrva rates
w be vared to compare the approach wth the other strateges.
The resuts obtaned are reported n Secton 5.2 .
92' T%e t8o4p%ase ALP formulation
To ensure unqueness, assume state V(0,(1,0)) acts as a reference
state by takng V(0,(1,0))=0. The frst-phase ALP s gven by
( ) ( ) ( ) ( ) . ), , ( , , , , , , . .
max
' '
,
a x k a k r a k x k p k c x k r g t s
g
y
r g
+ +

394'
;5
Denote the soutons by (
1 1
, r g ). The bass functons that are used
are
}, 4 , 3 , 2 , 1 { , ;
,
, 1
0

b a k k
k
b a ab
a a
394-
(5
where
b
k
s the number of cars n fow b. The vaue functon s
approxmated by a second order poynoma that ncudes terms that
correate dfferent fows wth each other, wth dfferent weghts for
each state of ght
( ) i l,
. Hence, the vaue functon s ( ) ( ) i l k V . ,
approxmated by ( ) ( ) i l k r . , , whch s
( ) ( )
.
, ,
4 3
) , (
34 4 2
) , (
24 3 2
) , (
23 4 1
) , (
14 3 1
) , (
13 2 1
) , (
12
2
4
) , (
44
2
1
) , (
11 4
) , (
4 1
) , (
1
) , (
0
k k r k k r k k r k k r k k r k k r
k r k r k r k r r i l k r
i l i l i l i l i l i l
i l i l i l i l i l
+ + + + +
+ + + + + + +
394-
'5
The constant term
) 0 , 1 (
0
r =0, snce ( ) ( ) 0 0 . 1 , 0 V s chosen. There are
n tota 158 = '-( varabes. In ths case, the number of varabes
does not change as the number of states grows. The near
programmng s soved by usng the Premium )olver Platform for
*xcel( whch s abe to sove near programmng probems wth a
number of varabes up to 8,000. Whe the ALP may nvove
manageabe number of varabes, the number of constrants s st
as many as the number of state-acton pars. Therefore, to be abe
to sove the ALP probem, the argest state on each fow has to be
mted to be abe to ncude a the state-acton constrants for
sovng the LP probem. The maxmum of the argest state on each
fow s 4 correspondng to 7,500 constrants based on the MDP
formuaton. The optma pocy
1
assocated wth
1
g can be
generated by takng
( )
( )
( ) ( )
'

'
, , ; , min arg ,
'
1
'
) , ( ,
1
k
i l k A a
a k r a k x k p x k
.
394--5
In order to check whether the frst phase of the two-phase ALP
yeds a good approxmaton for the vaue functon, smuaton of the
states w be done for 500,000 sots. The second phase of the two-
phase ALP s done by sovng near probem ( 5 -23).
( ) ( ) ( ) ( ) a x k a k r a k x k p k c x k r g t s
r c
y
T
r
), , ( , , , , , , . .
max
' '
2
+ +
394-
65
An obvous choce for
2
g s
1
g , the estmate for the optma average
cost obtaned from the frst phase ALP. The souton to the second
phase ALP s r
&
. It s amed to contro the accuracy of the
approxmaton to the cost functon over dfferent portons of the
state space. As descrbed n Theorem 1, maxmzng r c
T
s
equvaent to mnmzng
) (
2
r V c
g
T

1 whch s the sum the errors of
the approxmaton to the cost functon weghted by
c
for baancng
accuracy of the approxmaton over dfferent states. Based on
Theorem 2, the state reevance weghts can be chosen
correspondng to the statonary state dstrbuton. It may suffce to
use rough guesses about the statonary state dstrbuton n some
cases. Thus, the state reevance weghts
c
are chosen n the form
( ) ( )
( )
( )
( )
( )
( )
( )
( )
( )
'
+ +
+ +
, 2 , 1 1
, 1 , 1 1
, ,
3 1 4 2
4 2 3 1
3
2
3
2
1
3
2
3
2
1
l if
l if
i l k c
k k k k
i i a
k k k k
i i a

394-
75
where
( ) ( ). , ,
) , ( ,
i l k
i l k c a
394-
95
To make sure that the sum of state reevance weghts s 1, ( ) ( ) i l k c , ,
s mutped by
a
1
.
92- =esults and e!aluation
92-2' S#mmetric arri!al rates
In ths secton, the resuts for varyng arrva rates at a symmetrc
F4C2 ntersecton are presented n whch a fows have dentca
arrva rates per sot. The average cost
1
g resuted by the frst
phase ALP and the average cost
* 1
g
and
* 2
g
yeded by the
greedy pocy wth respect to
1
r and
2
r , respectvey, are gven
n the Tabe 5 -1 beow. The second phase ALP s done for the state
reevance weghts
c
wth
i
=0.99 for a i. As observed, the resuts

from the second phase ALP do not dffer much from the frst phase
ALP.
Ta,le 94'> T%e a!erage cost in slots for t%e s#mmetric "7C-2
Largest state 4 4 4
f
q

0.2 0.3 0.4
1
g
1.86 3.22 4.96
* 1
g
2.00 4.00 6.86
* 2
g
2.00 3.99 6.85
The coeffcents of the near combnaton of the bass functons for
approxmatng the vaue functon are gven beow. There are severa
remarkabe observatons regardng the coeffcents. The coeffcents
r
+,
and r
&-
for a
( ) i l,
are zero. Ths means that the mutpcaton of
the number of fows from dfferent combnatons does not have
added vaue to the approxmaton of the vaue functon.
Ta,le 94-> T%e 8eig%ts of t%e linear com,ination of t%e ,asis
functions for t%e s#mmetric "7C-1 8it%
2 . 0
f
q

f
2
Coefficie
nts
lA' lA-
iA( iA' iA- iA6 iA( iA' iA- iA6
r
0
0.00 0.34 1.17 0.13 0.00 0.34 1.17 0.13
r
1
0.76 1.00 6.99 5.76 4.70 3.35 2.01 0.62
r
2
4.67 3.35 2.01 0.62 0.78 0.98 7.06 5.84
r
3
0.63 0.98 7.06 5.85 4.67 3.36 2.02 0.61
r
-
4.70 3.36 2.02 0.61 0.61 1.00 6.99 5.76
r
++
0.64 1.41 0.00 0.13 0.26 0.44 0.55 0.66
r
&&
0.29 0.44 0.56 0.67 0.63 1.45 0.04 0.17
r
,,
0.65 1.44 0.04 0.17 0.29 0.43 0.55 0.66
r
--
0.26 0.43 0.54 0.65 0.65 1.41 0.00 0.13
r
+&
0.09 0.00 0.00 0.00 0.00 0.02 0.09 0.09
r
+,
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
r
+-
0.01 0.04 0.21 0.22 0.22 0.00 0.01 0.01
r
&,
0.00 0.01 0.07 0.07 0.07 0.00 0.00 0.00
r
&-
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
r
,-
0.20 0.00 0.01 0.01 0.01 0.04 0.19 0.20
Ta,le 946> T%e 8eig%ts of t%e linear com,ination of t%e ,asis
3 . 0
f
q

f
2
Coefficie
nts
lA' lA-
r
0
0.00 0.43 1.68 0.00 0.00 0.43 1.68 0.00
r
1
0.98 1.91 7.21 6.60 5.79 4.38 2.81 0.98
r
2
5.54 4.39 2.82 0.98 0.98 1.89 6.98 6.30
r
3
0.67 1.87 6.83 6.13 5.39 4.11 2.53 0.67
.
-
5.64 4.09 2.52 0.67 0.67 1.90 7.06 6.44
r
++
0.82 1.28 0.00 0.00 0.13 0.40 0.60 0.82
r
&&
0.14 0.40 0.61 0.82 0.82 1.25 0.00 0.00
r
,,
0.83 1.22 0.00 0.00 0.14 0.42 0.62 0.83
r
--
0.13 0.42 0.62 0.83 0.83 1.25 0.00 0.00
r
+&
0.41 0.02 0.06 0.07 0.08 0.09 0.32 0.38
r
+,
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
r
+-
0.06 0.07 0.26 0.30 0.32 0.01 0.05 0.06
r
&,
0.20 0.09 0.33 0.38 0.41 0.04 0.16 0.18
r
&-
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
r
,-
0.35 0.04 0.16 0.19 0.20 0.08 0.28 0.33
Ta,le 947> T%e 8eig%ts of t%e linear com,ination of t%e ,asis
4 . 0
f
q

f
2
Coeffcen
ts
=1 =2
=0 =1 =2 =3 =0 =1 =2 =3
r
0
0.00 0.60 2.40 0.14 0.00 0.60 2.40 0.14
r
1
1.20 2.26 7.19 6.75 6.60 5.00 3.21 1.03
r
2
6.79 5.20 3.39 1.16 1.25 2.32 7.34 6.93
r
3
1.11 2.34 7.38 6.96 6.83 5.22 3.40 1.11
r
-
6.64 5.03 3.22 0.98 1.07 2.27 7.22 6.79
r
++
0.92 1.24 0.00 0.00 0.00 0.33 0.63 0.95
r
&&
0.00 0.33 0.63 0.96 0.95 1.25 0.00 0.00
r
,,
0.98 1.25 0.00 0.00 0.00 0.34 0.64 0.98
r
--
0.00 0.34 0.64 0.97 0.95 1.24 0.00 0.00
r
+&
0.10 0.16 0.21 0.26 0.29 0.06 0.07 0.09
r
+,
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
r
+-
0.15 0.04 0.05 0.07 0.07 0.08 0.11 0.14
r
&,
0.24 0.03 0.04 0.05 0.05 0.13 0.18 0.22
r
&-
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
r
,-
0.21 0.16 0.21 0.26 0.28 0.11 0.15 0.18
In order to evauate the resuts obtaned from the ALP approach, a
comparson w be made wth the average cost based on severa
contro strateges obtaned by smuatng the chan. Frst, the ALP
approach s compared to the FC strategy, n whch the order of the
served combnatons s fxed and aso the duraton of the green
perods. Further, the ALP approach s compared to the exhaustve
contro strategy, n whch the order of served combnatons s fxed
and the green perods w be kept unt the fows n the combnaton
that has rght of way s empty. Two aternatve exhaustve contros
are aso added n the evauaton; XHC(1) and XHC(2), whch
antcpate departures durng 1 and 2 yeow sots, respectvey. In
other words, the green perods w be kept unt the number of cars
at each fow n the combnaton that has rght of way s at most one
and two n XHC(1) and XHC(2), respectvey.
Based on Littles Law, the average watng tme per car can be
derved from the average number of cars watng at the queue.
Denote the average watng tme n seconds for fow
f
as
) (
f
W
.
Then,
f
f
f
q
g
W

2
) (
,
394-
:5
where f
g
and f
q
are the average number of cars watng at fow
f
and the arrva rate, respectvey. The overa average watng tme
s weghted by the average arrva at the queues. Thus, the overa
average watng tme n seconds s gven by
c
g
g
c q
g
c
q
W
c
q
W
f
f
f f
f f
f
f
f
2 2
. . 2 ) ( ) (
,
394-
.5
where

f
f
q c
.
Tabe 5 -5 presents the overa average watng tme n seconds for
varyng arrva rates at a symmetrc F4C2 ntersecton, whch means
that a fows have dentca arrva rates per sot of f
q
2
Ta,le 949> B!erall a!erage 8aiting time in seconds for t%e
s#mmetric "7C-2
Rue

f
q
0.2

f
q
0.3

f
q
0.4
Two-phase ALP 5.00 6.65 8.57
FC 5.37 7% 7.30 10% 8.92 4%
XHC 5.66 13% 7.54 13% 8.65 1%
XHC(1) 5.06 1% 6.69 1% 8.37 -2%
XHC(2) 5.16 3% 6.97 5% 8.99 5%
FC green perods
n sots for C1,
C2
1, 1 3, 3 8, 8
For ths smpe ntersecton, the overa average watng tme ganed
from the two-phase ALP approach s ess than the other strateges.
The resuts obtaned by the two-phase ALP approach s cose to the
antcpatng exhaustve XHC(1) strategy.
92-2- As#mmetric arri!al rates
In the case of asymmetrc arrva rates, two stuatons are
consdered for F4C2. The frst stuaton that s consdered s where
the arrva rates are 0.15 for fows 1 and 3, and 0.45 for fows 2 and
4. Hence, the fows wthn the same combnaton have the dentca
arrva rates, but C2 s three tmes as busy as C1. The second
stuaton s where the arrva rate for fow 1 s 0.10, and the arrva
rates for the other fows are 0.30. The coeffcents of the near
combnaton of the bass functons obtaned from the two-phase ALP
approach for approxmatng the vaue functon are gven beow.
Ta,le 94:> T%e 8eig%ts of t%e linear com,ination of t%e ,asis
functions for t%e as#mmetric "7C-1 8it%
q
A3(2'91 (2791
(2'91 (27952
Coefficie
nts
lA' lA-
r
0
0.00 0.44 0.00 0.00 0.00 1.25 2.97 0.00
r
1
2.25 1.22 9.01 8.32 7.61 5.97 4.28 2.25
r
2
6.30 6.13 4.61 0.91 0.91 3.10 6.76 6.44
r
3
2.64 1.07 9.67 8.70 8.29 6.59 4.81 2.64
r
-
6.19 5.66 4.03 0.91 0.91 3.07 6.65 6.30
r
++
0.62 1.76 0.00 0.00 0.00 0.19 0.39 0.62
r
&&
0.00 0.00 0.30 0.95 0.95 0.95 0.00 0.00
r
,,
0.68 1.97 0.00 0.00 0.00 0.21 0.43 0.68
r
--
0.00 0.10 0.42 0.95 0.95 0.94 0.00 0.00
r
+&
0.08 0.01 0.01 0.01 0.01 0.03 0.06 0.07
r
+,
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
r
+-
0.03 0.00 0.00 0.00 0.00 0.01 0.02 0.03
r
&,
0.12 0.17 0.21 0.25 0.26 0.04 0.09 0.11
r
&-
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
r
,-
0.22 0.18 0.21 0.26 0.27 0.07 0.16 0.20
Ta,le 94.> T%e 8eig%ts of t%e linear com,ination of t%e ,asis
functions for t%e as#mmetric "7C-1 8it%
q
A3(2'1 (261 (261
(2652
Coeffcen
ts
=1 =2
=0 =1 =2 =3 =0 =1 =2 =3
r
0
0.00 0.31 1.05 0.00 0.00 0.47 1.44 0.03
r
1
0.55 0.00 6.22 4.98 4.18 2.87 1.71 0.55
r
2
4.93 4.04 2.46 0.68 0.68 1.81 6.33 5.52
r
3
1.42 1.66 7.82 7.23 7.08 5.57 4.02 1.41
r
-
5.26 4.02 2.43 0.69 0.69 1.88 6.96 6.36
r
++
0.56 1.54 0.10 0.17 0.23 0.46 0.51 0.56
r
&&
0.10 0.36 0.59 0.79 0.79 1.13 0.00 0.00
r
,,
0.91 1.38 0.00 0.00 0.00 0.32 0.55 0.91
r
--
0.16 0.34 0.58 0.78 0.78 1.23 0.00 0.00
r
+&
0.00 0.01 0.15 0.16 0.16 0.00 0.00 0.00
r
+,
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
r
+-
0.02 0.05 0.27 0.29 0.29 0.00 0.00 0.00
r
&,
0.66 0.21 0.24 0.28 0.30 0.14 0.51 0.61
r
&-
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
r
,-
0.22 0.24 0.15 0.18 0.19 0.05 0.17 0.20
The resuts are gven n Tabe 5 -8. The overa mean watng tme s
denoted by
) (W
and the average watng tme for fow
f
s
denoted by
) (
f
W
. The ength of green perods n the FC contro
strategy for C1 and C2 are 1 and 5 sots, respectvey, n the frst
stuaton. In the second stuaton, the ength of the green perods s
3 sots for both combnatons.
Ta,le 94)> Mean 8aiting times in seconds for t8o
as#mmetric "7C- cases2
Rue
) (W ) (
1
W ) (
2
W ) (
3
W ) (
4
W
q
=(0.15, 0.45,
0.15, 0.45)
Two-phase ALP
5.95 8.47 5.16 8.41 5.07
FC 6.32 6% 10.53 4.91 10.55 4.91
XHC 6.72 13% 9.86 5.68 9.88 5.67
XHC(1) 6.37 7% 8.12 5.79 8.11 5.79
XHC(2) 7.32 23% 7.07 7.42 7.08 7.39
q
=(0.10, 0.30,
0.30, 0.30)
Two-phase ALP
6.22 4.93 5.43 8.25 5.40
FC 7.09 14% 5.19 7.30 7.31 7.30
XHC 6.96 12% 6.42 6.66 7.72 6.67
XHC(1) 6.22 0% 5.25 6.05 6.88 6.06
XHC(2) 6.61 6% 4.63 6.65 7.19 6.65
As observed, the two-phase ALP approach resuts a ower overa
mean watng tme compared to the other strateges. Further, t s
observed that the fows that are parts of the more busy combnaton
have a ower watng tme. Ths gves an dea that a more busy
combnaton gets the prorty. If a busy fow and a ess busy fow are
grouped together n one combnaton, then the ess busy fow w
take advantage of the prorty and the more busy fow w suffer a
bt.
926 =educed Linear Program
Athough the dmenson of the probem s reduced n the
approxmate verson (ALP), the number of constrants s st as
many as the number of state-acton pars whch can be very arge.
The two-phase ALP ony soves parts of the curse of dimensionality
probem. Therefore, the argest state on each fow s mted to 4. In
practce, ths s certany not the case. Van Roy |2/ suggested a
constrant sampng approach for approxmatng soutons to
optmzaton probems when the number of constrants s
ntractabe, the .educed Linear Program (RLP). The dea s to defne
a probabty dstrbuton 4 over the set of constrants and to ncude
ony a subset of the dstrbuted constrants for sovng the probem.
Two propertes were proven that f a reasonabe number of
constrants are samped from dstrbuted constrants, then amost a
others w be satsfed and the constrants that are not satsfed do
not dstort the souton too much.
The RLP for the traffc contro probem s characterzed as foows.
The argest state on each fow s set to 10. The constrant sampe
sze s 20,000. The subset of constrants were samped based on
probabty measure 4(k) =(1-p)
4
p
|k|
, p =0.99. In the case of sma
arrva rates, say 0.2, the RLP gves a good approxmate to the vaue
functon that resuts the average cost smar to average cost
yeded from the two-phase ALP. However, when the oad s hgh,
the resuts do not resembe the expectatons because the samped
constrants do not represent "amost a" other constrants, and the
constrants that are not satsfed dstort the souton effectvey. It s
proven that the resuts of RLP rey on an deazed choce of 4. The
underyng thought s smar as choosng the state reevance
weghts (see Secton 3.3 ). The constrants that represent the states
that are vsted more often shoud be then ncuded. The chosen
probabty measure 4(k) has an exponenta form whch gves hgh
probabtes to the states that are cose to the stuaton when there
are no cars present n a fows, .e., the states wth sma number of
cars present n the fows. Ths probabty measure s an acceptabe
choce of 4 when the arrva rates are sma. However, when the
arrva rates are hgher, the hgh probabty has to be assgned to
the states where more cars are present n the fows. Therefore,
exponenta form s not an deazed choce of the probabty
measure 4.
: Conclusion
The two-phase ALP approach for average cost dynamc
programmng s apped to the traffc contro at soated sgnazed
ntersecton. As an ustraton, an ntersecton of two two-way
streets wth controabe traffc ghts on each corner s consdered.
Based on the chosen bass functons and the state reevance
weghts, t s observed that the resuts based on the swtchng
schemes obtaned by the ALP approach were better compared to
the other strateges where there are at most 4 cars on each fow.
In order to dea wth the ntractabe state space, the ALP approach
provdes an agorthm for fndng a good approxmaton to the
optma vaue functon by fttng a parameterzed near functon.
Hence, the number of varabes that has to be stored n ALP
approach s equa to the number of the pre-specfed bass functons,
nstead of storng the optma vaue functon tsef for each state n
the system. However, the ALP approach st has as many
constrants as the exact near programmng formuaton. Therefore,
the argest state on each fow s mted to 4. In practce, ths s
certany not the case. Van Roy |2/ suggested a constrant sampng
approach for approxmatng soutons to optmzaton probems
when the number of constrants s ntractabe. The dea s to defne
a probabty dstrbuton 4 over the set of constrants and to ncude
ony a subset of the dstrbuted constrants for sovng the probem.
=eferences
|1| Faras, D.P. de, Ben|amn van Roy, "Approxmatng Lnear
Programmng for Average-Cost Dynamc Programmng".
|2| Faras, D.P. de, Ben|amn van Roy, "On Constrant sampng n
the Lnear Programmng Approach to Approxmate Dynamc
Programmng". /athematics of operations research( %ol. &0( 1o. ,(
August &22-( pp. -3&-&45.
|3| Faras, D.P. de, (|une 2002), "The Lnear Programmng Approach
to Approxmate Dynamc Programmng: theory and appcaton".
|4| Ha|ema, R, |an van der Wa (3
rd
November 2006), "An MDP
decomposton approach for traffc contro at soated sgnazed
ntersectons".
|5| Kooe, G (9
th
September 2005), "Lecture notes Stochastc
Optmzaton", Lecture notes VU Amsterdam.
|6| %ttp>CCen28ikipedia2orgC8ikiCTrafficDlig%tEIntroduction
|7| "Intersecton Safety: Myth Versus Reaty", Intersecton Safety
Bref, U.S. Department of Transportaton, Federa Hghway
Admnstraton.

Werkstuk Mak Tcm39 91392

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Werkstuk Mak Tcm39 91392

Hochgeladen von

Copyright:

Verfügbare Formate

Approximate Linear

Programming for Traffic

. The optma pocy

denote t%e a!erage cost and t%e

=0.99 for a i. As observed, the resuts

Das könnte Ihnen auch gefallen