Sie sind auf Seite 1von 6

Adaptive Signal-Vehicle Cooperative Controlling System

Tung Le, Chen Cai, and Toby Walsh


AbstractReducing traveling delay by optimizing trac signal schedules at intersection has long been under research. Similarly, many vehicle technologies have been applied to reduce fuel consumption and emission. One major source of fuel consumption and emission is the unnecessary stop-then-start cycles. In this paper, we propose the Adaptive Signal-Vehicle Cooperative control system (ASVC) that produces both the optimal trac signal schedules and the optimal vehicle speed advice to minimize both delay and stopthen-start cycles. Statistics shows ASVC reduces more than 53% delay and 54% stops, compared to the baseline system TRANSYT.

I. INTRODUCTION Most of the well-established trac controlling systems (such as SCOOT[12], SCATS[7], OPAC[6], RHODES[10]) are now capable of producing real-time signal scheduling for trac signals. Most of the systems aim at reducing the traveling time for all participating vehicles. Besides saving traveling time, reducing fuel consumption and pollutant emission is another goal of trac controlling systems. In [13], it is reported that 10% of fuel consumption is used in idling state when vehicles stop at trac lights or jam. However, all these aforementioned systems do little to actively reduce the unnecessary stop-then-start cycles, which is a major source of fuel consumption and pollutan emission. Recent development in V2I, I2V communication [14], [4] has given rise to feasibility of such a scheme. Some works have been done to address the problem. For example, in [5], a simple warning system is used on the highway to control speed and saw improvement in pollutant emission. In [9], a single intelligent car using 7 second look-ahead trac information is reported to to achieve 20% fuel saving under ideal conditions. The goals of our signal-vehicle cooperative controlling system (called controller for short) are two folds: produce an optimal schedule for the trac light and suggest an optimal speed for incoming cars to minimize stops. Our proposal is similar to the recently announced Travolution project at Audi[1]. However, to the best of our knowledge, the Travolution project is still in early prototyped stage and has not published any related results. One noteworthy work is that by Mandava [8], in which drivers are given dynamic advice to avoid stops completely at junctions on a corridor to reduce emission; the advice is produced after solving an optimization problem, aimed at getting the
Tung Le is a PhD student at NICTA and UNSW, 223 Anzac Pde, NSW 2052, Australia tungl@cse.unsw.edu.au Chen Cai is a researcher at NICTA, 223 Anzac Pde, NSW 2052, Australia Chen.Cai@nicta.com.au Toby Walsh is a senior researcher at NICTA, 223 Anzac Pde, NSW 2052, Australia Toby.Walsh@nicta.com.au

car through the downstream intersection during the green period. However, since the optimization only considers the vehicle side and does not aect signals, such approach is limited and only works under light-demand scenarios. By integrating both the vehicles and trac signals into the model, we can nd a more eective solution. This paper is organized as follows: in Section II, we formulate the problem as a Markov Decision Process and describe how Dynamic Programming (DP) algorithm solves it in principles. In Section III, Approximate Dynamic Programming is used to actually nd the optimal solution. Section IV provides the experimental results. Finally, section V concludes the paper and describes our future work. II. Method A. Assumptions, Terms and Notations For the scheme to work, we have the following assumptions: V2I, I2V capability: all smart vehicles are equipped with two-way communication capability with the controller at the immediate downstream intersection. All nonsmart vehicles do not have either capability. It is also assumed that CCTV camera, V2I and I2V is used to detect which lane a vehicle is on. This approach was used in ADEPT project [3] as early as 1996. Discretized time: time is discretized into seconds. Drivers compliance: drivers of smart cars follow instructions on adjusting speed given by the downstream controller. Double threshold for stops: in counting stops, to lter out the movement of cars waiting in queues, we apply two speed thresholds. A car is considered making a stop only when its speed changes from above an upper threshold to below a lower threshold. An illustration of this concept is in Fig. 1. Using a denition of stop based on a double threshold, rather than a single threshold or other denition, does not favour our system. Non-smart vehicle position: without V2I capability, their exact positions cannot be known. However, by putting an upstream detector and assuming free ow speed on a link, we can approximate the position of a non-smart car.

Head of platoon - HOP: Vehicles have a tendency to travel in platoons, so to reduce the computational demand for the controller, we only control speed of the head of a platoon (the rst car of a platoon). A distance dgap is used to separate one platoon from another, dgap = tgap Sgap , where tgap receives an empirical value (e.g 3), and Sgap is the speed (in meters/second) of the HOP of the following platoon. Overtaking is not considered. B. Problem formulation - Trac light scheduling and speed advising as a Markov Decision Process (MDP) process 1) State space: The model we use is discrete nite MDP. The controller is a single intersection controller, so we only consider a car once it comes onto a lane leading to the intersection, and void it when it go beyond the stop line. We use a vector to capture information of all cars entering the system v = v1 , v2 , ...vX . A car vi is modeled as a vector vi = , m, a, d where is the current speed (in meter/second), m is current distance to the stop line (in meter), a is the acceleration (meter/second squared), and d is the deceleration (meter/second squared). We also need to capture the current signal phase applied at the intersection. If a intersection has N inbound lanes, p = p1 , p2 , .., pN , and each component has a binary value with 0 denoting red light, and 1 denoting green light, yellow (amber) light is not considered. With that setting, a state is a vector s = v, p (1)

be captured by st+1 = f (st , ut , t (st , ut )), where t (, ) is a function capturing the randomness. While we can describe in details how the transition of signal phase and vehicle state happen, such description is not necessary for our treatment of the problem. If the transition happens under a policy w, we will use f w . To shorten the notation, when there is no ambiguity, we use t for t (st , ut ) 4) Step cost: The cost incurring during each time step is given by: g(st , f (st , ut , t )) = (Xt t+1 + t+1 )t + kZt+1 (2) where Xt is the total number of vehicles at time t; the number of exiting and entering vehicles at time t + 1 are t+1 , and t+1 , respectively; k is a constant, and Zt+1 is the number of stops made by all vehicles at time t + 1. C. Optimization objective The objective of the Dynamic Programming algorithm is to nd a policy that minimizes the expected cost over the horizon T. Let w represent a policy and w the optimal one. The objective is to nd
T 1

w = arg min Ew
wW t=0

t g(st , f (st , ut , t ))|s0 = s

(3)

where is a learning rate. The DP algorithm nds the solution by recursively computing the Bellman equation J(st ) = min E {g(st , f (st , ut , t )) + J(st+1 )}
ut U

(4)

We will use the form with subscript when time is mentioned: st = vt , pt is the state of the system at time t. It should be brought to readers attention that subscript of p is used for time step, and superscript for vector component. For example, pk denotes the signal value of j lane k at time step j. 2) Control: At a time, a decision on signal phase must be produced, as well as suggested action for every vehicle in the system. u = p, a , where p is the phase chosen at time t. a = a1 , a2 , ..., aX represents suggested action for car 1, 2,...,X respectively. A suggested action takes value in the range [D, A] where D is the maximum deceleration and A is the maximum acceleration rate. All ai corresponding to non-smart cars have 0 as their value since the controller cannot control the speed of such cars. 3) Dynamics: At the beginning of each time step t, the following events happen: signal control is applied, speed advice is given, vehicles (both smart and nonsmart) move to new positions or exit the system, new vehicles enter the system. The major source of randomness come from new vehicles, entering the system according to a Poisson process with a known arrival rate. Another source of randomness is the unrestricted action of non-smart vehicles. So the transition from one state to the other can

for t = 0,..,T-1. The optimal decision at each step can be found by u = arg min E {g(st , f (st , ut , t )) + J(st+1 )} t
ut U

(5)

for t = 0,..,T-1. The optimal policy w can be found by storing the decision u over the trajectory. t III. Approximate Dynamic Programming (ADP) A. ADP and Reinforcement Learning (RL) To avoid the curse of dimensionality, instead of computing recursively for the value of J(st ), ADP uses a mapping function to approximate it, yielding J(st ). There are several ways to achieve this. We can either use a look-up table to store values of all states of the system or compute the approximated value from the state. We choose the latter method since lling the look-up table with good approximate values is tantamount to computing the value of J(s) directly. Meanwhile, we can still capture the key features of the state space, then evaluate the value from those. The approximation architecture we choose is a linear one, which has the form
N

J(s, r) =
n=1

n (s)r(n).

(6)

That is, a mapping function : X RN and a parametric vector r RN is used to approximate the value of J(s). As long as the architecture remains linear in (), with (s) being linear or non-linear in s, the control policy will converge with probability 1 [2]. Now the ADP proceeds as DP, replacing J(s) by J(s), and (4) now becomes J(st ) = min E g(st , st+1 ) + J(st+1 )
ut U

. Each component (n) is a row vector, dened as (n) = q, 0, m, 0 0, q, 0, m if pn = 1 (green light) if pn = 0 (red light) (11)

(7)

By setting the feature as above, we separate the values of queue under green light situation and those under red light. The linear approximation correspondingly is:
N

for t = 0,..,T-1. To compute the values of the parametric vectors r, we use Temporal-Dierence (TD) learning. More details on TD learning can be found in [15]. The principle is that TD constantly tracks the error between the estimated value and the observed value, and propagates the error signal back to the parametric structure so that r = arg min
r sS

J(s, r) =
n=1

(n)r(n) dened as:

(12)

J(s) J(s, r)

where each r(n) is a column vector r1 (n) r2 (n) r(n) = r3 (n) r4 (n)

(13)

Let the temporal dierence dt be dened as dt = g(st , st+1 ) + (J)(st+1 , rt ) J(st , rt ). (8)

For t = 0,1,. . . ,T the TD method updates rt according to the formula


t

rt+1 = rt + dt

()tk
k=0

r J(sk , rt ), T 1

By this setting, r1 , r3 is assigned to static queue and moving queue, respectively, if n receives green signal. Similarly r2 , r4 is assigned to static queue and moving queue, respectively, if n receives red signal. In order to compute the Temporal Dierence, Monte Carlo method is used to generate a set of policies W over a horizon M . The approximate optimal policy is now the optimal one among the generated policies W .

Applying (6) and take the derivatives, we have


t

w = arg min Ew
wW t=0

t g(st , f w (st , ut , t ))|s0 = s

rt+1 = rt + t dt

()
k=0

tk

(sk ),

(9)

where t is a sequence of scalar step sizes that satisfy the following terms for convergence

(14) Similar to (5), the optimal decision at each step can now be found by
t+M 1

u = arg min E t
ut U k=t

kt g(sk , f (st , ut , t )) + M Jt1 (st+M 1 , rt1 ) . (15)

t = and
t=0 t=0

2 t

<

(10)

and is known as trace eligibility factor, which takes value in [0,1]. B. Approximation feature Recall from equation (1) that each state is a vector s = v, p where v is the vehicles in the system and p is the current applied phase. In order to compute the approximate value J(s, r), we aggregate states into groups of vehicles on lanes. There are two type of queues: static queue and moving queue. Static queue is the number of cars stopping and waiting for green light in front of the stop line, represented by q. Moving queue is the number of vehicles moving on a lane, regardless of their positions, represented by m. Although vehicles are grouped into queues for the purpose of approximating the state value, suggested actions are still computed for individual vehicles. For an intersection with N inbound lanes, we have (s) = (1), (2), .., (N ) , p1 , p2 , .., pN

New observation is obtained after M steps, and the value of such state is backed up to current state as below
t+M 1

JM (st ) = min E
ut U k=t

kt g(sk , sk+1 ) + M Jt1 (st+M 1 , rt1 ) . (16)

The M-step temporal dierence is


t+M 1

dM = JM (st ) J(st , rt ) =
k=t

kt dk (sk , sk+1 ) (17)

Parameters are updated by


t+M 1

rt+1 = rt + t (st )
k=t

kt dk (sk , sk+1 ), t = 0, 1.... (18)

where t is the learning rate.

C. Suggested action Recall that a decision is made for both the trac signal and all vehicles: u = p, a . Under a randomly generated policy w, p is known, i.e, for simulated M steps ahead, we have p1 , p2 , .., pM xed. The control objective is to nd the unknown vector a. A vehicle loop is employed to achieve this goal. The controller rst decides on a set of HeadOf-Platoon H. Each vehicle loop is performed for a HOP vehicle h H. Let eh , lh are the earliest, and latest point of time (in seconds), respectively, that vehicle h can arrive at the stop line. Let dh be the distance from position of h to the stop line (in meters), max is the maximum speed allowed on link, min is a small speed value greater than zero (because vehicles should be kept moving). The values of eh and lh are computed as follows: eh = dh / max , lh = dh / min . Each vehicle loop simulates the movement of all the vehicles ahead of current HOP to the horizon lh , then decides HOPs action based on the principle: if there is a moment within the arrival window (i.e, [eh , lh ]) when the queue is empty and trac light is green, the HOP should tune its speed to get through the intersection at that moment. The Procedure 1 describes formally action of a single HOP. It should be remarked that all vehicles in front of h is viewed by h as non-smart since nonsmart vehicles have a tendency to travel at maximum allowed speed to reach the intersection as early as possible. Therefore, the smartness of the front vehicles does not aect the future queue state at the stop line.

Procedure 1 Vehicle loop - simulating HOP action 1: nd the earliest arrival time eh 2: nd the latest arrival time lh 3: simulate all front cars behaviors for lh , treating them as non-smart 4: nd J [eh , lh ] such that queue = 0 and light = green 5: if J found then 6: nd acceleration ah by equation (19) 7: else 8: J lh 9: nd deceleration ah by equation (20) 10: end if

where J0 is the time for decelerating to min , and J1 is the time for constant-speed traveling at min , and J0 +J1 = J. The acceleration and deceleration is within physical capability of a generic passenger car. In our simulated environment, the maximum acceleration rate A has value 3 meters/second2 , and maximum deceleration rate D has value 6 meters/second2 . D. Control Algorithm The whole signal and speed control scheme can be summarized in Procedure 2.Regarding implementation, the discount factor we use in equation 4 has value of 0.95. The number of look-ahead steps (M) is 40. IV. Experimental results A. Trac model The controllers are tested on a modied version of the Green Light District simulator [17]. Each conguration is simulated 50 times. The car movement model used in the simulator is a cellular-automata [11]. To make the model more realistic, we employed a higher resolution, in which cell size is 1.5 meters. The time unit is 1 second. Each car occupies 5 cells, and the maximum increase in speed is 2 cells/second. Let the velocity of a vehicle be , its maximum velocity is max and the safety distance to the next car ahead is dsaf e . The velocity update is done through Procedure 3 B. Network topologies Each road is two-way with three lanes on each direction. The kerbside lane is for the right turn and straight movement. The center-side lane is for the left turn only. The middle lane is for straight movement only. The maximum speed allowed on all roads is chosen as 60 km/h, which is common in urban areas. The value is translated into 11 cells per second on the simulator. The distance from each origin to its closest intersection is 500 meters. The link length between any two junctions is also 500 meters. The network (Fig. 2) is composed of 8 origins/destinations (labeled from 1 to 8), and 4 junctions (labeled from 9 to 12). When a car is rst released into the system, its speed is 60 km/h. So the origins/destinations can be viewed as gateways to this smaller part of a bigger network.

Fig. 1. Example of making stop - only vehicle 1 is considered making a stop since its speed is changing from above upper threshold to below lower threshold

The equation used to nd acceleration is dh =


h J0

+ 1/2ah J0 +

max J1

(19)

where h is the current speed of h, J0 is the time for accelerating to max , and J1 is the time for constant-speed traveling, and J0 + J1 = J. Similarly, the equation to nd deceleration is dh =
h J0

+ 1/2ah J0 +

min J1

(20)

Procedure 2 Control procedure Step 0: Initializing: 0.1 Choose an initial state s0 0.2 Initialize functional parameter r0 0.3 Initiate learning rate 0 0.4 Set time index t = 0 Step 1: Update trac state 1.1 Set time index t = t + 1 1.2 Update v, the vehicle state 1.3 Update p, the trac signal Step 2: Evaluate control decisions 2.1 If signal change is not admissible, set pt = pt1 2.2 If signal change is admissible, for the planning period of M-steps 2.2.1: Decide HOP set H 2.2.2: Generate random policies set W , nd suggested action a using 2.2.3: For each w W Procedure 1 2.2.4: Find the optimal decision u using (15) t Step 3: Update approximation function 3.1 Calculate new observation JM (it ) using (16) 3.2 Calculate current approximation Jt1 (it , rt1 ) using (12) 3.3 Calculate M-step temporal dierence using (17) 3.4 Update functional parameter vector rt1 using (18) Step 4: 4.1 Implement optimal decision u for the rst second. t 4.2 Transfer signal state accordingly Step 5: Stopping If t < T then go back to Step 1. Otherwise, stop. Procedure 3 Modied cellular-automata model Acceleration: max{min{ max , + 1}, min{ max , + 2}} Slowing down (avoid collision): min{ , dsaf e } Randomization: with probability p, max{ 1, 0}

Fig. 2.

Infrastructure network

Fig. 3.

Four main signal phases

hicles arrive at their destinations. The spawning frequency is 0.2 vehicles/origin/second. So in one hour, the total number of vehicles released is 5760 vehicles. F. System performance and analysis To quantify the eect of smart cars on the total performance of the system, we varied the fraction of smart cars from 0 to 1. Results are given in Table III, Table IV, and Table V. Performance of the base-line system is given in Table II. From these results, some key points can be derived. Firstly, the family of ASVC controllers (with various smart-car rates) outperform the base-line system signicantly. TRANSYT employs longer phase for straight-ahead lanes and phases lengths are xed as the nature of TRANSYT. Under TRANSYT control, vehicles often form long queues before getting discharged during long green periods. On the other hand, ASVCs benet from producing exible phase lengths, better adapted to the oncoming trac. Secondly, the more smart cars there are in the system, the more saving (in terms of delay and stops) is achieved. However, the rate of additional saving is not proportional to the rate of smart-cars increment. It is because the network has its own capacity bottleneck. V. CONCLUSIONS AND FUTURE WORKS In this paper, we have proposed an Adaptive SignalVehicle Cooperative control system. The schedules are found by Approximate Dynamic Programming and the speed advice is found by simulation and classical mechanics. ASVC and TRANSYT are tested in a microscopic simulated environment. ASVC is statistically shown to be signicantly more eective than TRANSYT, in terms

C. Trac signal settings For local intersection, signal schedule consists of 8 phases: 4 main phases and 4 intergreen phases. The illustration of main phases are as in 3 . Each intergreen is between the two main phases. Each main phase has a minimum length of 10 seconds and a maximum length of 80 seconds. Each intergreen phase has an exact length of 5 seconds. D. Base-line system: TRANSYT [16] TRANSYT is an oine signal controller whose optimization objective measure is delay + k stops, which means it does take into account the number of stops. The version used is TRANSYT 12.0. We use TRANSYT to generate the xed-time schedules for the base-line system. E. Trac demand Vehicles are released into the system in a simulated period of 1 hour. The system keeps operating until all ve-

TABLE I Lengths of main phase output by TRANSYT, cycle length = 100 second Node 9 10 11 12 Phase 1 22 33 33 20 Phase 2 17 10 10 18 Phase 3 12 21 19 32 Phase 4 29 16 18 10

TABLE V ASVCs Average Objective (objective value = avg delay + 20 avg stops), improvement compared to TRANSYT(measured in percentage) Smart-car rate Avg. obj Std. deviation Improvement 0 141.7 3.33 53.43 0.25 135.49 2.34 54.33 0.5 127.7 2.6 55.63 0.75 124.11 2.17 56.39 1 123.15 2.59 57.65

TABLE II Performance of TRANSYT, objective value = avg delay + 20 avg stops Avg. delay Std. deviation of delay Avg. stops Std. deviation of stops Avg. obj Std. deviation of obj 210.33 14.2 5.29 0.38 316.25 21.93

of saving delay and stops. In our future works, we will investigate at deeper levels how much pollutant emission (such as CO2, NOx) and fuel consumption is saved by our system. References
[1] Audi. Travolution project - http://www.travolutioningolstadt.de/ accessed june 2011. [2] Dimitri P. Bertsekas. Dynamic Programming and optimal control. Athena Scientic, 2007. [3] P. Blythe. Rd for road tolling, road-use pricing and vehicle access control. In RFID Technology (Ref. No. 1999/123), IEE Colloquium on, pages 8/1 816, 1999. [4] A. Bohm and M. Jonsson. Position-based data trac prioritization in safety-critical, real-time vehicle-to-infrastructure communication. In Proc. IEEE Int. Conf. Communications Workshops ICC Workshops 2009, pages 16, 2009.

TABLE III ASVCs Average Delay (measured in second/vehicle), improvement compared to TRANSYT(measured in percentage) Smart-car rate Avg. delay Std. deviation Improvement 0 98.67 2.94 53.08 0.25 96.2 2.1 54.2 0.5 93.3 2.0 55.64 0.75 91.89 1.99 56.31 1 88.83 2.01 58.00

[5] Margarida C. Coelho, Tiago L. Farias, and Nagui M. Rouphail. Impact of speed control trac signals on pollutant emissions. Transportation Research Part D: Transport and Environment, 10(4):323 340, 2005. [6] N.H. Gartner. Optimized policies for adaptive control (opac). Technical report, Transportation Research Board, Washington D.C, January 2001. [7] P.R. Lowrie. The sydney co-ordinated adative trac system - principles, methodology, algorithms. In International Conference on Road Trac Signaling, Institution of Electrical Engineers, pages 6770, 1982. [8] S. Mandava, K. Boriboonsomsin, and M. Barth. Arterial velocity planning based on trac signal information under light trac conditions. In Proc. 12th Int. IEEE Conf. Intelligent Transportation Systems ITSC 09, pages 16, 2009. [9] Chris Manzie, Harry Watson, and Saman Halgamuge. Fuel economy improvements for urban driving: Hybrid vs. intelligent vehicles. Transportation Research Part C, 15(1):116, 2 2007. [10] Pitu Mirchandani and Larry Head. A real-time trac signal control system: architecture, algorithms, and analysis. Transportation Research Part C: Emerging Technologies, 9:415432, December 2001. [11] K. Nagel and M. Schreckenberg. A cellular automaton model for freeway trac. Journal of Physics I, 2, 1992. [12] Dennis I. Robertson and R. David Bretherton. Optimizing networks of trac signals in real time - the scoot method. IEEE transactions on vehicular technology, 40(1), 1991. [13] J. Rueger. Clean diesel - real life fuel economy and environmental performance. Presentation SAE Government/Industry Meeting, Washington, DC., 2008. [14] T. Sukuvaara and P. Nurmi. Wireless trac service platform for combined vehicle-to-vehicle and vehicle-to-infrastructure communications. Wireless Communications, IEEE, 16(6):54 61, 2009. [15] Richard S. Sutton and Andrew G. Barto. Introduction to Reinforcement Learning. MIT Press, Cambridge, MA, USA, 1st edition, 1998. [16] R.A. Vincent, A.I. Mitchell, and D.I. Robertson. User guide to transyt version 8. Technical report, Transport and Road Research Laboratory, Crowthorne, Berkshire, UK, 1980. [17] M. Wiering, J. van Veenen, J. Vreeken, and A. Koopman. Intelligent trac light control. Technical report, University Utrecht, 2004. UU-CS-2004-029.

TABLE IV ASVCs Average Stops, (measured in stops/vehicle), improvement compared to TRANSYT(measured in percentage)

Smart-car rate Avg. stops Std. deviation Improvement

0 2.43 0.02 54.06

0.25 2.41 0.02 54.44

0.5 2.35 0.03 55.57

0.75 2.30 0.02 56.52

1 2.28 0.03 56.89

Das könnte Ihnen auch gefallen