Beruflich Dokumente
Kultur Dokumente
3, 2012
Li Weigang
Department of Computer Science,
University of Brasilia,
Caixa Postal 4466, 70919-970 Brasilia, DF, Brazil
Fax: (061) 32733589
E-mail: weigang@unb.br
145
146
Introduction
147
air traffic flow management (ATFM). Recent literature on the topic have discussed the
use of several forms of AI in ATM, such as: automats theory (Bayen et al., 2005),
intelligent agents (Wolf, 2007), reinforcement learning and multi-agents theory (Alves
et al., 2008).
Under normal operational conditions, the demand for air traffic services is satisfied
without delays imposed to aircraft. However, occasional operational constraints such as
bad weather, equipment and telecommunication malfunctions can create an unbalance
between demand and ATC capacity, resulting in overloads at one or more ATC sectors.
Such overloads require the adoption of traffic flow restrictions, known as tactical ATFM.
Such restrictions can include both ground and airbone holding delays.
Figure 1
SISCONFLUX-MAAD architecture
In 2007, the Brazilian Centre for Air Navigation Management (CGNA) started to develop
a modular system to assist air traffic flow managers in the application and management of
air traffic flow control measures. This system, called SISCONFLUX (the Portuguese
abbreviation for decision support system for tactical traffic flow management) is
capable of stipulating intentional delays to aircraft departing from certain terminal
manoeuvring areas (TMA) in order to avoid overloads on en route ATC sectors due to
occasional, temporary unbalances between demand and capacity. Figure 1 illustrates the
system, which is comprised of three modules:
148
At first, the actions are generated by maximum flow algorithms in MBF, representing
solutions that will not saturate the ATC sectors. From a certain point, however, the
actions suggested are generated by the reinforcement learning algorithm in MAAD.
SISCONFLUX was developed with the following goals:
1
to develop a decision support system that can provide CGNAs flow managers with
traffic flow control measures according to pre-established criteria such as total
system delay and ATC workload.
As part of the SISCONFLUX system development, two prototypes were built for the
decision evaluation and support module (MAAD). The first prototype suggests flow
traffic control measures without taking the human agents experience into account. The
second one is a decision support system that incorporates the experience of CGNAs flow
managers. This paper describes these two prototypes and discusses the results of their
application to scenarios based on real demand.
Literature review
Several techniques are used in the existing literature for ATFM. The most commonly
used are mixed integer linear programming (Ball et al., 2003), dynamic programming
(Zhang et al., 2005), and AI techniques such as expert systems (Weigang et al., 1997) and
reinforcement learning (Alves et al., 2008; Agogino and Tumer, 2009). In the application
of AI, ATFM is a complex decision-making process that involves several entities, where
intelligent software agents can be used both in simulations and real operations, assisting
human operators (Wolf, 2007).
In this context, agents can be conceived to collect, process and disseminate relevant
information about the ATFM environment, such as demand forecasts, projected impacts
and variations in airspace capacity. Ultimately, intelligent agents can be constructed to
generate ATFM actions/measures from the analysis of several scenarios, acting directly
in the decision-making process.
Ball et al. (2003) used a mixed integer linear programming approach to ATFM.
They proposed a generalisation of the classic network flow model by replacing the
deterministic demand with a stochastic one. As such generalisation destroys the
networks original structure, it makes it possible to show that the matrix subjacent to the
stochastic model constitutes a dual network. Therefore, the integer programming problem
associated to the stochastic model can be solved efficiently. This model was applied to
149
solve the ground holding problem (GHP) within ATFM. Simply put, the authors
presented an objective function that consisted in minimising the linear relation between
supply and demand in a classic network flow model, based on origin and destination node
sets, where the nodes represent the airports.
Ma et al. (2004) proposed a model based on graph theory where the airport and
ATC sector network is represented by nodes connected by arcs. The computing time to
reach the optimal solution, however, was prohibitively long. To address the problem of
computing time, Zhang et al. (2005) used dynamic programming and focused on
minimising congestion at the ATC sectors as opposed to minimising congestion at the
airports, using both ground and airspace holdings. The network is represented by a
tri-dimensional directional graph. Each airport, transfer point and navigational aid is
represented by a vertex, whereas the air route segments and approach corridors are
represented by directional arcs. An aircrafts flight plan is modelled as a sequence of
nodes on the graph. The objective function includes fuel consumption, ground and
airborne holding delay costs.
Recent research in ATFM combines reinforcement learning and multi-agent
techniques. Agogino and Tumer (2009) use this combination to incorporate the human
agents experience into automated ATM processes. ATC domains are too complex to be
completely automated and are directly impacted by tactical ATFM. Figure 2 illustrates
the flow of information and actions considering a system where the agent is fully
automated.
Figure 2
In the fully automated process, the control measure suggested by the ATFM agent is
applied directly, modifying the traffic flow. The modified scenario is then assessed
according to pre-established criteria and the system computes a reward to the agent. This
reward is then used to modify the control policy that it adopts, which may change the
next recommendations. In ATFM, however, the most adequate configuration suggests a
decision-making process in which a software agent offers recommendations to a human
150
agent, who acts as a filter by either accepting or rejecting the software agents suggestion.
Figure 3 illustrates this concept.
Figure 3
Semi-automated system
Agogino and Tumer (2009) developed metrics to evaluate the efficiency of traffic flow
control measures generated by reinforcement learning agents. As a result, it was possible
to improve the actions taken by an agent conceived to simulate the behaviour of air traffic
controllers, given the suggestions by the ATFM agent and as such suggestions were
accepted by the air traffic controller. Figure 4 illustrates this model.
Figure 4
Semi-automated system
It should be noted that, in the system proposed by Agogino and Tumer (2009), the ATFM
agent is a reinforcement learning algorithm that seeks to maximise the reward that
derives from the scenarios generated, which will be evaluated based on the total delays
generated and the number of aircraft in the ATC sectors both resulting from the flow
control policy. In turn, the air traffic controller is characterised as a reward maximising
reinforcement learning agent that seeks only the minimisation of the amount of aircraft in
151
the ATC sectors. This assumed behaviour for the air traffic controller is not entirely valid,
as the minimisation of traffic in the ATC sectors is not the ATCs only goal. In reality,
ATC seeks a balance between safety which benefits from a low number of aircraft in
the ATC sectors and operational efficiency, which depends on the maximisation of
flows. This assumed behaviour is, therefore, a significant limitation of the model.
In the process illustrated in Figure 4, the ATFM agent suggests a control measure.
The air traffic controller has a choice between accepting the measure and rejecting it. The
air traffic controller has rewarding scheme conceived to encourage acceptance of the
suggestions. When the measure is adopted, the new scenario is projected and the total
delay and number of aircraft in the ATC sectors are computed. Based on these values, the
system determines the value of the reward, which will impact the policy used in future
suggestions. The simulations performed with real data showed a reduction in air traffic
congestions of about 20% compared to when the air traffic controller acts alone.
Methodology
MAAD was developed with the use of AI techniques. As the air traffic system can be
modelled as a Markov decision process, an agent was developed based on multi-attribute
utility theory, decision theory and reinforcement learning (Russel and Norvig, 2003). The
first two were the base for the development of state evaluation function for the evaluation
of ATFM scenarios. The results of this evaluation are used to produce the rewards
awarded to the agent.
Reinforcement learning was used in the implementation of a software agent for
decision support in ATFM, with the ability to incorporate the experience of decision
makers. Hence, a device capable of generating flow adjustment policies sets of
traffic flow control measures that are suggested to the flow managers the human
agents was developed.
152
2010. Such actions were extracted from the CGNAs daily status reports. This action sets
will be referred to as flow adjustment measures, or simply measures.
In the definition of the measures, all records from the daily status reports between
January 2008 and March 2010 were surveyed. In general, a measure consists of delays
(ground holdings) of 5, 7, 10, 15, 20 and 25 minutes imputed on aircraft departure times
from the TMAs. These delays can either be applied indistinctively to all aircraft
departing from a TMA or selectively to certain aircraft with specific destinations. Among
the 37 TMAs in the system, only 19 featuring an annual traffic of at least 20,000
movements were included in the model. These 19 TMAs represent approximately 93%
of the total traffic at the airports operated by INFRAERO, the Brazilian state-owned
airport operator (INFRAERO, 2009).
Table 1 shows two examples of measures contained in the action sets developed.
Table 1
TMA
SP
10
15
25
X
X
BH
AN
X
-
10
15
20
25
RE
BR
CY
Measure 2
20
SP
SP
As can be seen in Table 1, an action consists of a set of delays specific for each terminal.
The X represents delays being applied to all aircraft leaving the TMA. If, for a certain
TMA, the delay is to be applied only to airplanes with a specific destination, a two letter
code representing that destination is used.
153
Home
Sector
12
Sector
34
Sector
5
Sector
6
Sector
7
Sector
8
Sector
9
Sector
10
Sector
11
Sector
12
00:15
19
10
02:15
10
10:30
13
14
10
12:40
10
10
10
14:10
17
12
10
15.25
10
13
21:00
10
11
10
23:00
11
16
13
23:25
11
14
10
12
Table 3
Home
Sector
12
Sector
34
Sector
5
Sector
6
Sector
7
Sector
8
Sector
9
Sector
10
Sector
11
Sector
12
00:15
02:15
10:30
12:40
14:10
15.25
21:00
23:00
23:25
where
e
state
TC(e)
D(e)
total delay
C(e)
normalisation coefficients, + = 1.
(1)
154
D (e )
t
(2)
tT
where
Dt(e)
set of TMAs.
The delay for each TMA is the product of the number of aircraft set to depart in the next
2 hours and the suggested action (delay in minutes per aircraft), i.e.,
Dt (e) = mt pt
(3)
where
mt number of aircraft to in TMA t to which the measure is applied
pt
10, ms cs
Cs (e ) =
0, ms < cs
(4)
where
Cs(e)
ms
cs
capacity of sector s.
The total ATC load factor used in the scenario evaluation function is the sum of the
factors calculated for all factors,
C (e ) =
C (e )
s
sS
where
S
(5)
155
In summary, MAAD was built as a software agent with the goal of choosing the measure
that produces the best evaluation of the scenario, i.e., the highest reward. The agent seeks
to maximise the rewards using the Q-learning algorithm.
This research was preceded by several other studies on the application of reinforcement
learning agents to ATFM. Such studies have demonstrated that the use of an agent system
for ATM requires intensive knowledge (Wolf, 2007). Some of this knowledge can be
obtained from experts and in real time, but this is not always practical especially when
the experts do not have all the knowledge required. In such cases, knowledge acquisition
can be done in alternative ways, such as historical data. These data can, in turn, consist of
a set of actions applied by human experts in past scenarios. This mode of knowledge
acquisition is defined as historical data learning (Wolf, 2007).
In historical data learning, the data when available can serve as the basis for the
development of human behaviour models, either manually or through machine learning
methods. The scenario configuration and the inputs from human experts must form the
attribute set, from which any correct decision will be learned.
Figure 5
It should be noted that in the studies on the application of software agents to ATFM
found in the literature, the behaviour of air traffic controllers and traffic flow managers
was simulated e.g., Agogino and Tumer (2009). As a consequence, the agents did not
learn from real human agents. This research proposes a novel methodology in which the
experience of human ATFM experts is incorporated by the software agent using historical
156
data, extracted from ATFM measures applied by CGNA flow managers between 2008
and 2010. Figure 5 illustrates the proposed learning cycle.
In Figure 5, it can be seen that the ATFM agent is fed with ATFM measures
previously taken by CGNA flow managers in real situations. These measures, together
with those generated by the flow balancing module (MBF) in SISCONFLUX, will make
up the agents action set. From this action set, the agent will suggest measures and the
acting/learning cycle will proceed as normal. In summary, the learning cycle effectively
incorporates the human agents experience. Such concept represents a clear advantage of
this approach when compared to previous works, as it allows the evaluation through
comparative analysis of typical actions both by human and software agents.
Since the ATFM agent developed in this research has the ability to suggest actions
typically taken by human agents, it made it possible to solve an extremely relevant
problem in ATFM: the need to apply flow control measures in a more punctual, selective
manner, affecting only aircraft with specific destinations. This application aims to restrict
the impact of those measures to certain areas more directly related to the unbalance
between demand and capacity. Such measures could not have been generated by a system
only equipped with maximum network flow algorithms such as the ones used in MBF.
Case study
In the case study, two agent prototypes were used: MAAD and MAAD*. The difference
between them is in the action set used. MAAD uses an action set generated exclusively
by the flow balancing module (MBF). On the other hand, MAAD* uses an action set
containing a sub-set comprised of historical actions taken by CGNA traffic flow
managers. The use of an action set that includes historical actions taken by human flow
managers is one of the main contributions of this work.
The scenarios submitted to the prototypes were defined using traffic flows predicted
by CGNA. The prototypes were trained with the use of congested scenarios projected
from the demand forecasts supplied by CGNA. To accelerate the learning process, a
scenario configuration that reduces airspace capacity and consequently increases the
number of saturation points was used.
Collection of data on air traffic demand for the Brazilian airspace between January
2008 and March 2010.
Collection of data on air traffic flow control measures taken by CGNA traffic flow
managers in real situations for the same period as above. These measures were added
to the actions set used in MAAD*.
Inserting the flights in systems database, allowing the occupancy of the ATC sectors
to be projected.
Initial tests for validation of the prototypes, using the scenario for 6 March 2008.
157
Verifying that Q-learning algorithm used in MAAD converges, using the scenario
for 18 March 2010.
Verifying that Q-learning algorithm used in MAAD* converges, using the scenario
for 18 March 2010.
158
with respect to total traffic. To account for this non-scheduled demand, several TMAs in
the case study had their demands increased by a factor ranging from 5% to 20%.
Parameter
Value
Normalisation constants
= 0.0136985, = 0.9863015
Temporal discount
= 0.8
Learning rate
= 0.2
Exploration/exploitation criteria
In Table 4, the normalisation constants were set such that each factor, in the worst-case
scenario, contributes with approximately 50% to the reinforcement value. The temporal
discount and the learning rate were set using values of Ribeiro et al. (2006) to evaluate
the efficiency of the Q-learning algorithm. Finally, the exploration/exploitation criteria
was set empirically.
MAAD
15
15
15
10
TMA
SP
BR
BH
AN
UL
RJ
CT
PA
FL
10
10
15
15
15
MAAD*
10
15
15
MAAD
15
15
10
MAAD*
10:20
Delay (minutes)
09:55
Delay (minutes)
10:45
10
15
10
MAAD
10
10
10
MAAD*
Delay (minutes)
11:10
10
10
MAAD
10
10
MAAD*
Delay (minutes)
11:35
10
10
MAAD
10
MAAD*
Delay (minutes)
12:00
MAAD
5SP
5SP
5SP
5SP
5SP
7BR
MAAD*
Delay (minutes)
12:25
MAAD
5SP
5SP
5BR
MAAD*
Delay (minutes)
Table 5
Time
159
160
Figure 6
An important finding form the observation of the values in Table 5 and the evolution of
Q(s, a) observed from the 26th iteration is that, as the scenario becomes less complex,
MAAD* tends to suggest actions from the subset of actions extracted from CGNA flow
managers experience. This confirms the assertion that the human expertise becomes
more efficient than the machines as the scenario complexity decreases. Indeed, in the
last iterations, the actions typically taken by human experts resulted in larger values of
Q(s, a) when compared with the actions suggested by the MAAD prototype without any
built-in human experience. The results of this case study indicate that the prototypes
developed can be of great value in ATFM decision support, especially when such
measures are more localised and applied to departures in a more selective fashion.
Conclusions
ATFM is of crucial importance to the airspace control system. The use of reinforced
learning in ATFM applications is relatively recent. This paper presented an application of
a reinforced learning agent that suggests actions to be taken by traffic flow managers in
terms of delays imposed to departing aircraft through ground holding. The objectives of
these actions are to avoid congestion at the ATC sectors.
In the case study, comparisons were made between actions defined exclusively by
computer algorithms and actions that included the experience of human managers. The
results indicate that the incorporation of the human experience improves the actions
suggested by the algorithm, especially when the complexity of the scenario is low.
Acknowledgements
This research is partially supported by Brazilian National Council for Scientific and
Technological Development CNPq (Procs. 306065/2004-5 and 485940/07-8) and
ATECH Tecnologias Crticas, Brazil.
161
References
Agogino, A. and Tumer, K. (2009) Learning indirect actions in complex domains:
action suggestions for air traffic control, Advances in Complex Systems, Vol. 12, Nos. 45,
pp.493512, World Scientific Company.
Alves, D.P., Weigang, L. and Souza, B.B. (2008) Reinforcement learning to support metalevel
control in air traffic management, in Weber, C., Elshaw, M. and Mayer, N. (Eds.):
Reinforcement Learning Theory and Applications, ARS, Vienna.
Ball, M.O., Hoffman, R., Odoni, A. and Rifkin, R. (2003) Stochastic integer program with dual
network structure and its application to the ground holding problem, Operations Research,
Vol. 51, No. 1, pp.167171, Institute for Operations Research and the Management Sciences
(INFORMS) Linthicum, Maryland, USA, ISSN:0030-364X.
Bayen, A.M., Grieder, P., Meyer, G. and Tomlin, C.J. (2005) Lagrangian delay predictive model
for sector-based air traffic flow, AIAA Journal of Guidance, Control, and Dynamics, Vol. 28,
No. 5, pp.10151026.
INFRAERO (2009) Movimentos nos Aeroportos, available at http://www.infraero.gov.br
(accessed on July 2009).
Ma, Z., Cui, D. and Cheng, P. (2004) Dynamic network flow model for short-term air traffic flow
management, IEEE Transactions on Systems, Man and Cybernetics Part A: Systems and
Humans, Vol. 34, No. 3, pp.351358.
Ribeiro, R., Koerich, A.L. and Enembreck, F. (2006) Uma nova metodologia para avaliao de
desempenho de algoritmos baseados em aprendizagem por reforo, XXXIII Seminrio
Integrado de Software e Hardware. Anais do XXVI Congresso da Sociedade Brasileira de
Computao.
Russel, S. and Norvig, P. (2003) Artificial Intelligence A Modern Approach, 2nd ed., Pearson
Education, Inc., New Jersey.
Timoszczuk, A.P., Pizzo, W.N., Staniscia, G.F. and Siewerdt, E. (2009) The SYNCROMAX
solution for air traffic flow management in Brazil, in Weigang, L., de Barros, A. and
de Oliveira, I.R. (Eds.): Computational Models, Software Engineering, and Advanced
Technologies in Air Transportation: Next Generation Applications, pp.2337, IGI Global,
Hershey.
Watkins, C.J.C.H. and Dayan, P. (1992) Q-learning, Machine Learning, Vol. 8, No. 3,
pp.279292.
Weigang, L., Alves, C.J.P. and Omar, N. (1997) An expert system for air traffic flow
management, Journal of Advanced Transportation, Vol. 31, No. 3, pp.343361, ISSN: 01976729.
Wolf, S.R. (2007) Supporting air traffic flow management with agents, American Association for
Artificial Intelligence Spring Symposium: Interaction Challenges for Intelligent Assistants.
Zhang, Z., Gao, W. and Wang, L. (2005) Short-term flow management based on dynamic
flow programming network, Journal of the Eastern Asia Society for Transportation Studies,
Vol. 6, pp.640647.