Beruflich Dokumente
Kultur Dokumente
6, NOVEMBER 2006
1511
I. INTRODUCTION
1512
Control System (UTCS). A common form of traffic signal control is preset timings. These preset timings are optimized offline for the traffic patterns at a particular time of day. At a predetermined time, a new set of timings is downloaded to each
traffic signal in the network. However, these preset timings may
not work well for a complex traffic network with continuously
changing traffic volumes throughout the day and between days.
This presents the need to implement a control system that can
perform real-time update of traffic signals in the traffic network
based on the changes in the traffic volume. Such an idea is possible only if the local controllers can adapt to the changing dynamics of the traffic network. For the case where individual
local controllers are controlling the traffic signals for an indefinite amount of time after they are being installed into the traffic
network, the problem of real-time traffic signal control can be
said to take the form of an infinite horizon distributed control
problem. Hence, for effective traffic signal control, such controllers need to adapt themselves continuously.
Different techniques exist in designing these real-time traffic
signal controllers. Examples include the use of fuzzy logic and
fuzzy sets [8], [9], as well as the use of genetic algorithm and reinforcement learning [10]. Most of these ideas are based on the
distributed approach where a local controller is assigned to update the traffic signals of a single intersection based on the traffic
flow in all the approaches of that intersection. Recent research
works also include the use of NN-based controllers [11][13].
Some of these approaches such as [11] and [13] use a simplified traffic network model consisting of a single intersection.
Hence, it is unclear if the neural controller can effectively control a large-scale traffic network with multiple intersections.
C. Objectives
NN-based local controllers implemented for the infinite
horizon distributed control problem have to constantly adapt
CHOY et al.: NEURAL NETWORKS FOR CONTINUOUS ONLINE LEARNING AND CONTROL
1513
For such a distributed approach, each local controller will generate its own control variables based on the local information it
receives. Also, exchange of information can be present among
the local controllers either laterally (for controllers in the same
hierarchical level) or vertically (between lower level controllers
and higher level controllers). This is a form of cooperation and
it can be used to affect the generation of control variables. [15]
presents an analytical approach in dealing with control problems of such nature by modeling the traffic network as graphs
consisting of nodes and links. Based on the approach in [15],
the following are defined:
directed graph with a set
of
nodes and a set of links describing the traffic
network;
local controller acting at node
total number of temporal stages at time ;
local input information vector of
weight vector of
function of
(2)
is the total mean stoppage time for the vehicles,
where
is the total amount of stoppage time faced by all vehicles
that entered and left the traffic network during the time when the
is the total number of vehicles
measurement was taken, and
that entered and left the traffic network during the time when
the measurement was taken.
Finally, the current vehicle mean speed is average speed of
all the vehicles that are currently in the traffic network. These
three performance measures are reflective of the overall traffic
condition in the traffic network. For an overcongested traffic
network, the vehicles are likely to suffer from high stoppage
time as they pass from one street to another and, consequently,
the delay will be high and their current mean speed will be low.
C. NNs for Distributed Control of a Traffic Network
A large-scale control problem such as controlling the traffic
signals in a traffic network can be divided up into subproblems
where each subproblem is being handled by a local controller.
;
at stage ;
(1)
is the total mean delay of vehicles,
is the total
where
amount of delays faced by all vehicles that entered and left
the traffic network during the time when the measurement was
is the total number of vehicles that entered and
taken, and
left the traffic network during the time when the measurement
was taken.
Similarly, the total stoppage time for each vehicle was also
stored in memory to facilitate the calculation of the mean stoppage time. The equation for calculation of the total mean stoppage time is given as follows:
at stage
at stage ;
;
optimal neural control
at stage ;
1514
Given the aforementioned, the problem now arises concerning the following:
1) the approximating ability of the NNs that are involved;
to derive a good approximation of
2) the ability of each
the optimal solution in a timely manner.
The first issue is mainly that of a structural issue concerning the
layout of the NNs. [15] has supplied various proofs concerning
the approximating properties of NNs with a single hidden layer
for solving the distributed control problem. For other NNs such
as that of the fuzzy-NNs with two or more hidden layers, there
exist proofs to show that those NNs can work as a universal
approximator under certain conditions, e.g., [17].
The second issue, however, may not be easily solved even if
the first issue has been resolved (recall that [15] did not give
the rate of convergence for its propositions). The difficulty in
obtaining a reasonably good approximated optimal solution for
at stage
(or any other future state) is
stage
due to computational limitation as well as the limitations of
various existing parameters update algorithms. Computational
limitation refers to the limited number of stage
in which
each
can be considered at any particular stage due to
finite memory storage space as well as computational speed of
the processor. The limitations of existing parameters update algorithms are mainly due to the fact that most parameters update algorithms for connectionist networks are designed for finite horizon learning processes. Hence, if these algorithms are
employed for an online, infinite horizon learning process, problems such as solutions getting stuck in a local minima and inadequate stochastic exploration can become very severe. As such,
the focus of this paper is to develop a new hybrid NN model as
well as a new, continuously updated SPSA-NN model that are
suited for such a problem and to compare their performances.
III. NN MODELS FOR TRAFFIC SIGNAL CONTROL
Several NN-based traffic signal control models have been presented in previous research works. Three representative examples taken from [11], [12], and [13] will be discussed in this
section.
The research works in [11] and [13] involve designing an
NN-based controller for updating the traffic signal of an isolated
traffic intersection. Both [11] and [13] incorporate the concept
of fuzzy logic and their fuzzy-NNs are of the five-layer type
(inputs, fuzzification, inference, consequence, defuzzification).
The inputs for the fuzzy-NNs that are implemented in [11] and
[13] consist mainly of two types as follows:
1) the number of vehicles that pass through the different approaches of the intersection;
2) the number of vehicles waiting in the various queues.
The outputs of the fuzzy-NN controllers are various traffic
signal plans that involved adjusting certain components of the
traffic signal (refer to the problem description section in Section I-A). For example, [13] uses several sets of fuzzy-neural rule
bases to generate different types of green-split adjustments based
on the inputs. In [13], reinforcement learning and the gradient
descent methods are used to adjust the shape of the fuzzy membership functions (through updating the weights of the fuzzy-NNs).
There are several limitations to the approach adopted in [13]
as reported by its authors. First, the neural learning is not effective under certain circumstances due to the lack of stochastic
CHOY et al.: NEURAL NETWORKS FOR CONTINUOUS ONLINE LEARNING AND CONTROL
1515
The five-layered fuzzy NN shown in Fig. 2 follows the popular convention used in the literature and it can be used for a
wide variety of applications. The choice of the operators between the fuzzification layer (second layer) and the implication
layer (third layer) is taken to be -norm. The choice of the operator between the implication layer (third layer) and the consequent layer (fourth layer) is taken to be -norm. Membership
functions of the terms of the fuzzy output are singletons since
complicated membership functions and complex algorithms for
defuzzification may affect the real-time performance of the NN
controller with no significant improvement in its behavior. Section IV gives a brief introduction and description of the advantages of stochastic approximation as well as SPSA. Following
which, it describes how SPSA can be applied to update the
weight of an NN.
at stage
at stage
for each
for each
(3)
(4)
Based on that, several gradient approximation methods have
been developed including a popular finite difference approximation method by KieferWolfowitz, [24]. Spall [22] adopted
another approach (SPSA) using the idea of simultaneous perturbation to estimate the gradient. The formal proof of convercan be found
gence of SPSA and the asymptotic normality of
in [22] and will not be presented in this paper. However, some
of the expressions of the SPSA algorithm are given as follows
1516
(5)
is the stochastic perturbation that is applied to
where
during stage
and
represents measurement noise
terms and they must satisfy the following resembling a martingale difference:
a.s.
The estimate of
(6)
..
.
(7)
(8)
(1) can be rewritten in a more generalized form
(9)
is the bias in
(
where
or near unbiasness if the loss function
is sufficiently
smooth, the measurement noise satisfies (6) and the conditions
are met.
for
has been proven in [22] if
Strong convergence of (9) to
five other assumptions are satisfied (refer to [22]). As such, the
iterative form presented in (3) and (9) can be used to model the
iterative weight update process in an NN controller.
B. The Structure of a Local Controller Using Continuously
Updated SPSA-NN
As mentioned in the beginning, the continuously updated
SPSA-NN model used in this study strives to avoid the two
is formulated in
limitations of [12]. The structure of each
can ideally be left on its own once
such a way that each
it has been implemented. The five-layered fuzzy NN in Fig. 2
.
is used as the basic building block for each component in
is shown in Fig. 3.
The structure of a
consists of several deciAs can be seen in Fig. 3, the
sion makers, a single state estimator as well as a delay estimator
takes in traffic
(details of which will be given later). The
parameters as its inputs and generates a set of signal plans as
the output via a two-stage process. The state estimation stage
generates the estimated current state of the traffic network. The
number of decision makers used in the decision-making stage
will depend on the complexity of the problem. Based on the estimated current state of the traffic network, the appropriate decision-maker will be selected to generate a set of signal plans.
are
The decision-making and learning processes of each
enabled by the NNs and, at stage , they are as follows.
1) Weights belonging to the state estimator (SE) are perturbed
. This follows the use of a stochastic
randomly by
perturbation as shown in (5).
2) The SE takes in traffic parameters as inputs and estimates
the current state of the intersection.
CHOY et al.: NEURAL NETWORKS FOR CONTINUOUS ONLINE LEARNING AND CONTROL
(10)
is as follows:
(11)
1517
(12)
where is the state change sensitivity constant (determined empirically) and is the best state value. For there to be a positive reinforcement, it is necessary that
and
. Note that if
, the rein. This implies
forcement will be equal to zero if
that no reinforcement will be sent for the current stage if the best
state has already been achieved in the previous stage.
1518
(13)
(14)
CHOY et al.: NEURAL NETWORKS FOR CONTINUOUS ONLINE LEARNING AND CONTROL
1519
(15)
(16)
where
(17)
as a function of d .
TABLE I
PARAMETERS FOR EAFRG
1520
defuzzification process in the fourth layer, -norm fuzzy operator in the third layer, and -norm fuzzy operator in the second
layer.
Forgetting Mechanism: A forgetting mechanism is impleas well as the ORL
mented in the fuzzy NNs of each
module to affect the weight adjustment process. The principle
behind the forgetting mechanism is to enable the decision
module to search through the solution space in an explorative
manner rather than a purely exploitative manner [30] in order to
reduce the number of instances whereby the search is trapped
in a local minima. This is similar to the concept of simulated
annealing whose objective is to find the global minimum of a
cost function that characterizes large and complex systems. In
doing so, simulated annealing proposes that instead of going
downhill all the while to favor low-energy-ordered states, it
is good to go downhill most of the time. In other words, an
uphill search is needed at certain times. Results have shown
in [30] that this new approach provides a robust framework
for reinforcement learning in a changing problem domain
where the improvised algorithm with the forgetting mechanism
(19)
, and
where is the forgetting term and its value is
is a positive constant to be determined empirically. Using
this approach, the search for the optimal solution does not get
stuck in a local minima since the transition out of it is always
possible. The forgetting mechanism for each neural weight is
activated after a prespecified number of negative reinforcements
are received.
Following the learning rate and weight adjustment process,
the last stage of the multistage online learning process involves
using evolutionary algorithm for adjustment of fuzzy relation
CHOY et al.: NEURAL NETWORKS FOR CONTINUOUS ONLINE LEARNING AND CONTROL
1521
(20)
where is the current eligibility of the antecedent-implication
relation, is the measure of whether the relation is generally
satisfying, and is the sensitivity factor for (determined emand have counter balpirically). Hence, as can be seen,
ancing influence on each other. A relation may be generally satisfying having a high value, but due to the changing system
dynamics, it may not be eligible.
Hence, a low value will result. Adding them up will produce the overall fitness of the relation. is further defined as
follows:
(21)
where is the eligibility sensitivity factor (determined empiridenotes the eligibility trace at stage , and it is comcally),
puted as follows:
(22)
where is the decay constant (determined empirically), is the
reinforcement, and is the activation value which is zero (0) if
the rule is not activated and one (1) if activated. The function
denotes taking the difference between two chromosome
,
vectors. In this case, the first chromosome vector is
the current chromosome used by the fuzzy NN and the second
1522
TABLE II
NUMBER OF PEAK PERIODS FOR DIFFERENT SIMULATION RUNS
chromosome vector is
by EAFRG. For this research,
CHOY et al.: NEURAL NETWORKS FOR CONTINUOUS ONLINE LEARNING AND CONTROL
1523
Fig. 10. Current number of vehicles for typical scenario with morning peak (3 h).
Fig. 11. Current number of vehicles for typical scenario with morning and evening peaks (24 h).
Fig. 12. Current number of vehicles for extreme scenario with multiple peaks (24 h).
1524
Fig. 13. Screenshot of the installation of inductive loop detectors in the simulated traffic network (right-hand drive).
Fig. 14. Overall interaction diagram of the local controllers and the traffic network.
fore they are being decoded by the signal plans interpreter and
implemented into the traffic network. Each local controller has
eight different types of signal plans in which it can use to control the traffic signals at its intersection. These eight different
types of signal plans are designed to cater for different amount
CHOY et al.: NEURAL NETWORKS FOR CONTINUOUS ONLINE LEARNING AND CONTROL
TABLE III
TOTAL MEAN DELAY FOR TYPICAL SCENARIO WITH MORNING PEAK (3 h)
troller depends on the NN model that is being used. As mentioned in Sections IV and V, the structure of the local controller
for the continuously updated SPSA-NN model is shown in Fig. 3
and the structure of the local controller for the hybrid NN is
shown in Fig. 4.
The sampling rate for the local controllers can be coordinated
in order to make sure that the agents make timely responses
to the dynamically changing traffic network. For this study, the
local controllers are tuned to sample the traffic network for the
traffic parameters once every 10 s (simulation time).
C. Using GLIDE for Benchmarking
It is difficult to find a good benchmark for this large-scale
traffic signal control problem given the following factors.
1) Existing algorithms or control methodologies have been
developed for controlling the traffic networks of other cities
[11][13] with different traffic patterns and hence, the results obtained from those works cannot be applied directly
for this problem.
2) Some of the existing algorithms [11], [13] are developed
for simplified scenarios and, hence, they are not suitable
for benchmarking.
3) Commercial traffic signal control programs which are
known to have worked well are not easily available due to
proprietary reasons.
In all the experiments for this research, the signal settings
used for benchmarking are derived from the actual signal plans
implemented by LTAs GLIDE traffic signal control system.
GLIDE is the local name of Sydney coordinated adaptive traffic
system (SCATS) (it is one of the state-of-the-art adaptive traffic
signal control system [32] which is currently used in over 70
urban traffic centers in 15 countries worldwide). As such, for
simulation scenarios without the local NN-based controllers,
the signal plans selected and executed by GLIDE are implemented in the traffic network at the respective intersections as
the traffic loading at each intersection changes with time. The
traffic loading is derived from GLIDEs traffic count from the
loop detectors.
VII. RESULTS
The results for the three types of simulations are presented as
follows.
A. Typical Scenario With Morning Peak (3 h)
For the typical scenario with morning peak (3 h), six separate
simulation runs using different random seeds were carried out
for each control technique (continuously updated SPSA-NN,
1525
1526
TABLE IV
TOTAL MEAN DELAY FOR THE TYPICAL SCENARIO WITH MORNING AND EVENING PEAKS (24 h)
Fig. 15. Current vehicle mean speed for typical scenario with morning and evening peaks (24 h).
CHOY et al.: NEURAL NETWORKS FOR CONTINUOUS ONLINE LEARNING AND CONTROL
1527
Fig. 16. Signal plans selected by the continuously updated SPSA-NN-based local controllers for a particular simulation run.
TABLE V
TOTAL MEAN DELAY FOR THE EXTREME SCENARIO WITH MULTIPLE PEAKS (24 h)
(where the traffic network at the end of the six hour is free of
congestion) can be seen in Fig. 16.
As shown in Fig. 16, the set of signal plans selected by the
continuously updated SPSA-NN-based controllers towards the
last 50 rounds of the simulation run is largely similar. Hence,
it implies that some sort of convergence to a set of signal plans
has been achieved for that successful simulation run. Given the
short duration of the simulation, it is not known if the set of
signal plans are indeed the optimal set.
In contrast, the signal plans selections by the hybrid NN and
GLIDE are not found to achieve any form of convergence for
the various simulation scenarios. Hence, the 3-D plot is not presented for their cases.
D. Extreme Scenario With Multiple Peaks (24 h)
For the extreme scenario with multiple peaks (24 h), five separate runs are carried out using different random seeds for each
of the three control techniques. It has been observed that the
variances of all the simulation runs that are performed for a
single control technique is small. Hence, taking the mean of
the values will give a good representation of a typical outcome
for that particular control technique. For this scenario, two performance measures are shown. Table V shows the total mean
delay at the end of selective time periods (average over five
separate runs for each technique) for the three different control
techniques. Table VI shows the total mean stoppage time at the
end of selective time periods (average over five separate runs for
each technique) for the three different control techniques. Note
that the simulation run for the extreme scenario ends after the
eighth peak period (as shown in Fig. 12).
From Tables V and VI, it can be seen that the hybrid NN
controllers manage to achieve the best performance for the extreme scenario with multiple peaks that lasts for 24 h. Using the
GLIDE signal plans, the total mean delay and the total mean
stoppage time for the vehicles increase steadily after the fifth
peak period. The continuously updated SPSA-NN-based controllers obtain a better performance compared to the GLIDE
signal plans despite the fact that the total mean delay and total
mean stoppage time for the vehicles increase steadily after the
sixth peak period. Compared to that, the total mean delay and
total mean stoppage time are increasing at a significantly slower
rate when the hybrid NN controllers are used. Overall, at the
1528
TABLE VI
TOTAL MEAN STOPPAGE TIME FOR THE EXTREME SCENARIO WITH MULTIPLE PEAKS (24 h)
Fig. 17. Traffic network controlled by continuously updated SPSA-NN-based controllers after 24 h (extreme scenario).
CHOY et al.: NEURAL NETWORKS FOR CONTINUOUS ONLINE LEARNING AND CONTROL
1529
Fig. 18. Traffic network controlled by hybrid NN controllers after 24 h (extreme scenario).
TABLE VII
PERCENTAGE REDUCTION IN TOTAL MEAN DELAY COMPARED TO GLIDE
1530
have to be tuned offline before the neural controllers are implemented into the traffic network. In this respect, the continuously
updated SPSA-NN has a smaller number of control parameters
to be tuned compared to the hybrid NN model given its simpler
weight update algorithm. The tuning of these parameters is done
empirically given the difficulty in doing it analytically. For example, Fig. 19 shows how the performance (measured in terms
of total mean delay of the vehicles) of the hybrid NN varies with
different values of the forgetting term of (19). The value of
can thus be chosen based on the results shown in the graph.
Once these parameters are set offline, they will not be further
adjusted during the simulations.
IX. CONCLUSION
A new hybrid NN model has been successfully developed
to solve the infinite horizon distributed control problem. The
model involves a novel multistage online learning process that
incorporates various techniques such as reinforcement learning
and evolutionary algorithm. An approximated version of the infinite horizon distributed control problem is implemented in the
form of distributed traffic signal control of a large-scale traffic
network. For this problem, the NN-based local controllers have
to learn continuously for an indefinite amount of time after they
are implemented into the system. Real-world traffic data used
for modeling the traffic network is obtained from LTA and the
traffic network model is built using PARAMICS modeler. In the
experiments, the traffic signals in the traffic network are controlled by three different control techniques (hybrid NN model,
the continuously updated SPSA-NN model, GLIDE) in separate
simulation runs. Results from the experiments showed that the
hybrid NN controllers and the continuously updated SPSA-NNbased controllers achieved an overall better performance compared to GLIDE for the 3 and 24 h (typical) simulation runs.
For the extreme scenario with multiple peaks (24 h), experimental results showed that the hybrid NN controllers outperforms the continuously updated SPSA-NN-based controllers as
well as GLIDE. From the tables of results as well as the screenshots of the traffic network, it can be inferred that the hybrid
NN controllers can provide effective control of the large-scale
traffic network even as the complexity of the simulation increases substantially. This research has extended the application
CHOY et al.: NEURAL NETWORKS FOR CONTINUOUS ONLINE LEARNING AND CONTROL
1531