Methods For Truck Dispatching in Open-Pit Mining

Thesis presented to the Faculty of the Department of Graduate Studies of the Aeronautics Institute of Technology, in partial fulllment of the
requirements for the Degree of Doctor in Science in the Program of Electronic Engineering and Computer Science, Field Computer Science.
Guilherme Sousa Bastos
METHODS FOR TRUCK DISPATCHING IN OPEN-PIT MINING
Thesis approved in its nal version by signatories below:
Prof.Dr. Carlos Henrique Costa Ribeiro Advisor
Prof.Dr Luiz Edival de Souza Co-advisor
Prof. Celso Massaki Hirata Head of the Faculty of the Department of Graduate Studies
Campo Montenegro So Jos dos Campos, SP - Brazil a e 2010
Cataloging-in Publication Data Documentation and Information Division Sousa Bastos, Guilherme Methods for Truck Dispatching in Open-Pit Mining / Guilherme Sousa Bastos. So Jos dos Campos, 2010. a e 140f. Thesis of Doctor in Science Course of Electronic Engineering and Computer Science. Area of Computer Science Aeronautical Institute of Technology, 2010. Advisor: Prof.Dr. Carlos Henrique Costa Ribeiro. Co-advisor: Prof.Dr Luiz Edival de Souza. 1. Programao matemtica. 2. Distribuio de mercadorias. 3. Algoritmos Genticos. ca a ca e 4. Matemtica aplicada. 5. Rotas. 6. Caminhes. 7. Matemtica. I. Aeronautics Institute of a o a Technology. II. Title.
BIBLIOGRAPHIC REFERENCE SOUSA BASTOS, Guilherme. Methods for Truck Dispatching in Open-Pit Mining. 2010. 140f. Thesis of Doctor in Science Aeronautics Institute of Technology, So Jos dos Campos. a e
CESSION OF RIGHTS
AUTHOR NAME:
Guilherme Sousa Bastos Methods for Truck Dispatching in Open-Pit Mining. Thesis / 2010
PUBLICATION TITLE:
PUBLICATION KIND/YEAR:
It is granted to Aeronautics Institute of Technology permission to reproduce copies of this thesis and to only loan or to sell copies for academic and scientic purposes. The author reserves other publication rights and no part of this thesis can be reproduced without the authorization of the author.
Guilherme Sousa Bastos Rua Oscar Renn, 309. Costa II o CEP 37500-433 ItajubMG a
METHODS FOR TRUCK DISPATCHING IN OPEN-PIT MINING
Thesis Committee Composition:
Prof.
Cairo Lcio Nascimento Jnior u u
Chair Person Advisor Co-advisor Member
ITA ITA UNIFEI ITA IME-USP UFOP
Prof.Dr. Carlos Henrique Costa Ribeiro Prof.Dr Prof. Dra. Dr. Luiz Edival de Souza Rodrigo Arnaldo Scarpel Leliane Nunes de Barros Marcone Jamilson Freitas Souza
External Member External Member -
ITA
To Karina, by her love and patience.
Acknowledgments
Thank you God for writing straight on crooked lines... All my entire life has been guided by this wise saying, and now, after a really hard way, Im here nishing my most important work till now. I would like to express my gratitude to my advisor Prof. Carlos Henrique Costa Ribeiro. Your supervision style was primordial to point the research way, by never giving the correct ways, but always avoiding me from the wrong ones. Because of this, I can arm that now I am a researcher. Thank you very much! Thanks to my co-advisor and colleague Prof. Luiz Edival de Souza. Your presence beside my oce was fundamental in my developments, by being every time available to answer and help me in my innite questions. This is an end point of your supervisions on my researches, which occurs since I was doing my engineering course; however, it is a start point of our future research projects. Thanks a lot! Another special thanks goes to my supa in Australian Centre for Field Robotics (ACFR) Dr. Fbio Ramos. Thank you for had received me in ACFR and supervised my work a during my six months stay in Sydney. This time period was the dierential of my work, which certainly will drive my future researches to a superior quality rate. Cheers mate! Thanks to CAPES for conceding a scholarship, which was primordial for my studies at ACFR. Continuing in the Oz Land, I must thank the persons that helped me in the works, and mainly in the the foreign life. Thanks to ACFR sta, and mainly to Vitor, Sildomar, Guilherme, Tim, Adrian, Paco, Simon, Gabriel, Surya, and Pablo Chilean. A special thanks goes to Pablo Peruvian, you were my rst friend in Sydney! Thanks to guide me
vi (a newbie) across the great pubs in the city! Another special thanks to my other friends in Sydney, which I can classify as my brothers, Alex Cowboy, Du, Elton, Leandro, and Pablo Chilean. My staying in Sydney can be divided on before and after knowing yous! Another thank to my great friend Andy and his wife Joanna; thanks a lot for bought Possante, I am sure that it will bring happiness for you! Many thanks to Karina Valdivia for teaching me the crazy Factored MDPs. I am sure that we can make a partnership in a near future to study and develop new trends in decision making area. So many times in this long way I had the comprehension of two special persons at UNIFEI allowing my research work at ITA and adjusting my schedule whenever I needed; thank you Prof. Carlos Augusto Ayres and Prof. Carlos Alberto Pinheiro. Thanks to my mom and dad for the constant incentives on my studies since I was a kid. I really cannot have achieved this position without your help. I love you two. A really special thanks to my wife Karina. Only you know the diculties that we have passed together during this years of studies... Thats the past, from now we will collect the fruits that we have started planting ve years ago! Thanks for everything my love! Eu te amo!!!
Logic takes you from a to b. Imagination takes you everywhere. Albert Einstein
Resumo
O transporte de material um dos mais importantes aspectos das operaes realizadas e co em minas a cu aberto. Este problema envolve geralmente um sistema de despacho de e caminhes, o qual realiza a alocao dos caminhes em tempo real. Dada a importncia o ca o a deste problema, diversos sistemas de deciso vm sendo desenvolvidos durante os ultimos a e anos, aumentando a produtividade e diminuindo os custos operacionais. Como em muitas outras aplicaoes reais, uma correta modelagem das incertezas presentes no problema c torna-se crucial para o bom funcionamento do sistema de despacho. Como incertezas podem-se citar falhas em equipamentos, condioes climticas e erros humanos, as quais c a podem resultar em las de caminhes e carregadeiras inoperantes. Entretanto, incertezas o no so consideradas na maioria dos sistemas de despacho comerciais, fato que pode levar a a a resultados longe dos esperados. Nesta tese, novos sistemas de despacho de caminhes so o a introduzidos aproximando deste modo os sistemas atuais a uma metodologia de deciso a estocstica. Primeiramente, apresentado um mtodo estocstico utilizando Processo a e e a Decisrio de Markov Dependente do Tempo (TiMDP) aplicado ao problema de despacho o de caminhes. Neste modelo, os tempos de deslocamento dos caminhes so representao o a dos como funes de densidade de probabilidade, janelas de tempo podem ser inseridas co representando disponibilidade das rotas existentes, e utilidade baseada no tempo pode ser utilizada como um parmetro de prioridade. Com o objetivo de minimizar a questo a a
ix j bem conhecida da maldio da dimensionalidade, na qual problemas multi-agentes esa ca to sujeitos quando se considera modelagem em estados discretos, o sistema modelado a e utilizando-se o conceito introduzido de simples-agentes interdependentes. Baseando-se ainda neste conceito, o mtodo TiMDP Gentico (G-TiMDP) apresentado para aplie e e caao no problema de despacho de caminhes. Este mtodo apresenta-se como uma hic o e bridizaao do modelo TiMDP e Algoritmos Genticos (GA), o qual tambm utilizado c e e e para solucionar o problema de despacho. Finalmente, de modo a testar e comparar os resultados dos mtodos introduzidos, so executadas simulaes pelo mtodo de Monte e a co e Carlo em uma mina heterognea composta por 15 caminhes, 3 carregadeiras e 1 ponto de e o processamento de minrio. O aspecto de incerteza presente no problema representado e e pela escolha da rota entre o ponto de processamento do minrio e as carregadeiras, a qual e realizada pelo motorista do caminho, sendo independente do sistema de despacho. Os e a resultados so comparados a sistemas clssicos de despacho (Heur a a stica Gulosa e Minimizaao dos Tempos de Ciclo dos Caminhes MTCT) utilizando o Teste T de Student, c o comprovando a ecincia dos mtodos de despacho de caminhes propostos. e e o
Abstract
Material transportation is one of the most important aspects of open-pit mine operations. The problem usually involves a truck dispatching system in which decisions on truck assignments and destinations are taken in real-time. Due to its signicance, several decision systems for this problem have been developed in the last few years, improving productivity and reducing operating costs. As in many other real-world applications, the assessment and correct modeling of uncertainty is a crucial requirement as the unpredictability originated from equipment faults, weather conditions, and human mistakes, can often result in truck queues or idle shovels. However, uncertainty is not considered in most commercial dispatching systems. In this thesis, we introduce novel truck dispatching systems as a starting point to modify the current practices with a statistically principled decision making methodology. First, we present a stochastic method using TimeDependent Markov Decision Process (TiMDP) applied to the truck dispatching problem. In the TiMDP model, travel times are represented as probabilistic density functions (pdfs), time-windows can be inserted for paths availability, and time-dependent utility can be used as a priority parameter. In order to minimize the well-known curse of dimensionality issue, to which multi-agent problems are subject when considering discrete state modelings, the system is modeled based on the introduced single-dependent-agents. Based also on the single-dependent-agents concept, we introduce the Genetic TiMDP (G-TiMDP) method
xi applied to the truck dispatching problem. This method is a hybridization of the TiMDP model and of a Genetic Algorithm (GA), which is also used to solve the truck dispatching problem. Finally, in order to evaluate and compare the results of the introduced methods, we execute Monte Carlo simulations in a example heterogeneous mine composed by 15 trucks, 3 shovels, and 1 crusher. The uncertain aspect of the problem is represented by the path selection through crusher and shovels, which is executed by the truck driver, being independent of the dispatching system. The results are compared to classical dispatching approaches (Greedy Heuristic and Minimization of Truck Cycle Times MTCT) using Students T-test, proving the eciency of the introduced truck dispatching methods.
List of Figures
FIGURE 2.1 Example of a MDP with 3 states. . . . . . . . . . . . . . . . . . . . 33 FIGURE 2.2 Value Iteration for (a) = 0.9 and (b) = 0.3. . . . . . . . . . . . . 36 FIGURE 2.3 TiMDP example solved step-by-step by value iteration. . . . . . . . 42 FIGURE 2.4 Sequential decision making problem using time-dependent utility. . . 46 FIGURE 2.5 Value function - V (1, t). . . . . . . . . . . . . . . . . . . . . . . . . . 46 FIGURE 2.6 Value function - V (2, t). . . . . . . . . . . . . . . . . . . . . . . . . . 47 FIGURE 2.7 Value function - V (3, t). . . . . . . . . . . . . . . . . . . . . . . . . . 47 FIGURE 2.8 Policies over time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 FIGURE 2.9 Value function V (1, t), P 1 = N (10, 3). . . . . . . . . . . . . . . . . . 49 FIGURE 2.10 Policies over time - P 1 = N (10, 3). . . . . . . . . . . . . . . . . . . . 50 FIGURE 3.1 1-truck-for-n-shovels strategy. . . . . . . . . . . . . . . . . . . . . . 56 FIGURE 3.2 m-trucks-for-1-shovel strategy. . . . . . . . . . . . . . . . . . . . . . 58 FIGURE 3.3 m-trucks-for-n-shovels strategy. . . . . . . . . . . . . . . . . . . . . 59 FIGURE 4.1 Abstract graph of a medium-scale mine. . . . . . . . . . . . . . . . . 62 FIGURE 4.2 Truck cycle time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 FIGURE 4.3 Path selection outcomes. (a) Crusher-Shovel 1-Crusher; (b) CrusherShovel 2-Crusher; (c) Crusher-Shovel 3-Crusher . . . . . . . . . . . . 67 FIGURE 4.4 Outcome likelihood functions. . . . . . . . . . . . . . . . . . . . . . 69 FIGURE 4.5 Truck dispatching state transitions. . . . . . . . . . . . . . . . . . . 78
LIST OF FIGURES
xiii
FIGURE 4.6 TiMDP truck dispatching states. . . . . . . . . . . . . . . . . . . . . 79 FIGURE 4.7 Expected tonnage production at crusher C (Truck 1 - Shovel 1 Queue 0). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 FIGURE 4.8 Expected tonnage production at crusher C (Truck 1 - Queue 0). . . 87 FIGURE 4.9 Expected tonnage production at crusher C (Truck 2 - Shovel 3). . . 88 FIGURE 4.10 Expected tonnage production at crusher C (Truck 2). . . . . . . . . 89 FIGURE 4.11 Comparative of expected tonnage production at crusher C (Truck 3 - Queue 0) for standard and Gauss representations. . . . . . . . . . 91 FIGURE 4.12 Comparative of expected tonnage production at crusher C (Truck 1 - Queue 0) for standard and Gauss representations. . . . . . . . . . 92 FIGURE 4.13 Truck dispatching GA chromosome. . . . . . . . . . . . . . . . . . . 94 FIGURE 4.14 Truck dispatching GA crossover. . . . . . . . . . . . . . . . . . . . . 95 FIGURE 4.15 Truck dispatching GA mutation. . . . . . . . . . . . . . . . . . . . . 95 FIGURE 4.16 Truck dispatching GA elitist behavior. . . . . . . . . . . . . . . . . . 96 FIGURE 4.17 Truck dispatching GA reproduction result. . . . . . . . . . . . . . . 97 FIGURE 4.18 Auxiliary chromosome array. . . . . . . . . . . . . . . . . . . . . . . 98 FIGURE 5.1 General mine simulation environment. . . . . . . . . . . . . . . . . . 104 FIGURE 5.2 Shovel 1 block simulation environment detail. . . . . . . . . . . . . 105 FIGURE 5.3 Queue 1 block simulation environment detail. . . . . . . . . . . . . 106 FIGURE 5.4 Paths 1 block simulation environment detail. . . . . . . . . . . . . . 107 FIGURE 5.5 Path 1 block simulation environment detail. . . . . . . . . . . . . . 108 FIGURE 5.6 Quantity of trucks in shovels for the Greedy Heuristic simulation. . 109 FIGURE 5.7 Quantity of trucks in paths going to Shovel 1 for the Greedy Heuristic simulation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 FIGURE 5.8 Quantity of trucks in shovels for the MTCT Heuristic simulation. . . 112 FIGURE 5.9 Trucks on parking lot for the MTCT heuristic. . . . . . . . . . . . . 113
LIST OF FIGURES
xiv
FIGURE 5.10 Quantity of trucks in shovels for TiMDP model simulation. . . . . . 114 FIGURE 5.11 Trucks on parking lot for TiMDP model. . . . . . . . . . . . . . . . 115 FIGURE 5.12 Quantity of trucks in shovels for the GA model simulation. . . . . . 116 FIGURE 5.13 Trucks on parking lot for the GA model. . . . . . . . . . . . . . . . 117 FIGURE 5.14 Quantity of trucks in shovels for the G-TiMDP simulation. . . . . . 118 FIGURE 5.15 Trucks on parking lot for the G-TiMDP model. . . . . . . . . . . . . 119 FIGURE 5.16 Mean time in the queues for TiMDP model. . . . . . . . . . . . . . 120 FIGURE B.1 Gamma distribution. . . . . . . . . . . . . . . . . . . . . . . . . . . 139 FIGURE B.2 Gaussian distribution. . . . . . . . . . . . . . . . . . . . . . . . . . . 140
List of Tables
TABLE 2.1 Transition Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . 33 TABLE 2.2 Reward . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 TABLE 2.3 MDP Solution ( = 0.9) . . . . . . . . . . . . . . . . . . . . . . . . 36 TABLE 2.4 Q(s, a) ( = 0.9) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 TABLE 2.5 MDP Solution ( = 0.3) . . . . . . . . . . . . . . . . . . . . . . . . 37 TABLE 2.6 Q(s, a) ( = 0, 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 TABLE 4.1 Truck specications. . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 TABLE 4.2 Shovel specications. . . . . . . . . . . . . . . . . . . . . . . . . . . 64 TABLE 4.3 Mining data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 TABLE 5.1 Monte Carlo simulations of truck dispatching methods using standard representation (standard deviation equals zero for all considered times). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 TABLE 5.2 Monte Carlo simulations of truck dispatching methods using Gaussian representation. . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 TABLE 5.3 Comparatives between truck dispatching methods using T-test. . . . 122 TABLE 5.4 Comparatives between truck dispatching methods with Gaussian representations using T-test. . . . . . . . . . . . . . . . . . . . . . . 123
List of Abbreviations and Acronyms

GA G-TiMDP PWC PWP PWL MDP MSWT MSC MTCT MTWT pdf ROM SA SMDP TiMDP Genetic Algorithm Genetic Time-dependent Markov Decision Process Piecewise Constant Piecewise Polynomial Piecewise Linear Markov Decision Process Minimizing Shovel Waiting Time Minimizing Shovel Saturation or Coverage Minimizing Truck Cycle Time Minimizing Truck Waiting Time probability density function Run Of Mine Simulated Annealing Semi-Markov Decision Process Time-dependent Markov Decision Process
Contents
1
1.1 1.2 1.3 1.4
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20
Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Work Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2
2.1 2.1.1 2.1.2 2.1.3 2.2 2.2.1 2.2.2 2.3 2.3.1 2.3.2 2.3.3
Time Dependence in Decision Processes . . . . . . . . . . .
27
Markov Decision Processes . . . . . . . . . . . . . . . . . . . . . . . . . . 29 MDP formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 MDP solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 A MDP example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Time-dependent Markov Decision Processes . . . . . . . . . . . . . . . 37 Discrete solution for relative time distributions by backwards convolution . 39 A TiMDP example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 Time-dependent utilities . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Decreasing time-dependent utility function . . . . . . . . . . . . . . . . . . 43 Increasing time-dependent utility function . . . . . . . . . . . . . . . . . . 44 A time-dependent utility example . . . . . . . . . . . . . . . . . . . . . . . 45
Truck Dispatching in Open Pit Mines . . . . . . . . . . . . .
51
CONTENTS 3.1 3.2 3.2.1 3.2.2 3.2.3
xviii
Vehicle dispatching problems . . . . . . . . . . . . . . . . . . . . . . . . 52 Truck dispatching problem . . . . . . . . . . . . . . . . . . . . . . . . . . 54 The 1-truck-for-n-shovels strategy . . . . . . . . . . . . . . . . . . . . . . . 55 The m-trucks-for-1-shovel strategy . . . . . . . . . . . . . . . . . . . . . . 57 The m-trucks-for-n-shovels strategy . . . . . . . . . . . . . . . . . . . . . . 58
4
4.1 4.1.1 4.1.2 4.1.3 4.1.4
Truck Dispatching Modeling . . . . . . . . . . . . . . . . . . .
61
A model for a medium-scale mine example . . . . . . . . . . . . . . . . 62 Mine environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 Specifying trucks and shovels . . . . . . . . . . . . . . . . . . . . . . . . . . 63 The truck cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 Mine uncertainties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.1.4.1 4.1.4.2 Stochastic path selection . . . . . . . . . . . . . . . . . . . 66 Gaussian-based truck traveling times . . . . . . . . . . . . 68
4.2 4.2.1 4.2.2 4.2.3
Truck dispatching methods . . . . . . . . . . . . . . . . . . . . . . . . . . 72 Greedy heuristic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 MTCT heuristic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 TiMDP Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.2.3.1 4.2.3.2 Single-dependent-agent TiMDP modeling . . . . . . . . . 75 TiMDP results and analysis . . . . . . . . . . . . . . . . . 82
4.2.4 4.2.5
Genetic Algorithm (GA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 G-TiMDP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5
5.1 5.2 5.3
Simulations and Analysis . . . . . . . . . . . . . . . . . . . . . . 102

Simulation Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 Dispatching Methods Behavior . . . . . . . . . . . . . . . . . . . . . . . 106 Comparative Results and Analysis . . . . . . . . . . . . . . . . . . . . . 117
CONTENTS
xix
6
6.1 6.2
Final Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 Appendix A Genetic Algorithm . . . . . . . . . . . . . . . . . 134

A.1 A.1.1 A.1.2 A.1.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 Population generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 Reproduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 A.1.3.1 Crossover . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 A.1.3.2 Mutation . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 A.1.4 Termination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
Appendix B Statistical Distributions . . . . . . . . . . . . . 138

B.1 B.2 Gamma Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 Gaussian Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
1 Introduction
1.1
Motivation
Truck dispatching is an important issue to be tackled in the Open-Pit Mining Area because of the costs of material transportation, which can represents up to 60% of operating expenditure in realistic settings (ALARIE; GAMACHE, 2002). Basically, truck dispatching is a combinatorial problem that consists of assigning trucks to shovels in order to optimize a specic objective while taking into account several constraints. The objective can be the maximization of the tonnage material transported during a shift (productivity policy), minimization of equipment inactivity, or Run of Mine (ROM) attendance (quality policy). All these mining objectives can be attended independently (only one objective per time single-objective) or combined with each other (two or more objectives combined to take the best result of each one or of the combination multi-objective). The objective attendance is generally subject to common mine constraints, such as truck hauling and shovel loading capacities, empty and loaded truck speeds, refueling, and preventive maintenance schedules. Recently, some dispatching systems were developed for open-pit mining using a diversity of operational research and evolutionary techniques (BRAHMA, 2007; KRAUSE;
CHAPTER 1. INTRODUCTION
21
MUSINGWINI, 2007; JAOUA; GAMACHE; RIOPEL, 2009). However, neither of them presents a stochastic representation for actions and environmental changes. The stochasticity is represented by inherent problem uncertainties, which are present in most realworld problems; classical and deterministic approaches does not consider uncertain behavior of real-world problems, leading most of time to non-optimal results. Truck dispatching problems in open-pit mines are often subject to uncertain behavior, such as fuel consumption variations, unexpected equipment stopping (faults, at tires, emergencies, etc.), and time variations of durative actions. Therefore, the truck dispatching modeling by a stochastic approach becomes crucial in order to attend and optimize its specic objectives. A stochastic truck dispatching system can be represented by a Markov Decision Process (MDP) (PUTERMAN, 1994), which is the classical approach for decision theoretic problems (LITTMAN; DEAN; KAELBLING, 1995; BOUTILIER; DEAN; HANKS, 1999). Uncertain parameters can be modeled based on reliable historical database, equipment faults, weather conditions and route availability. Another set of uncertain parameters based on time, such as truck travel and loading durations, cannot be represented by a MDP. To solve this problem, the truck dispatching model can be based on a Timedependent MDP (TiMDP) (BOYAN; LITTMAN, 2000), which is used to model and solve sequential decision problems with stochastic state transitions and stochastic timedependent action durations.
22
1.2
Objectives
The solution of a dispatching problem modeled by a MDP is represented by policies producing the actions that must be selected by the agent (truck) when it is in a specic state. Generally, for a single agent, the optimal solution can be found quickly by dynamic programming techniques. However, dispatching problems with many agents (multi-agent) will generate an exponential state space augmentation that causes a correspondingly drastic increase of the necessary time to nd the optimal solution. This issue is known as curse of dimensionality (BELLMAN, 1966) and can be very serious in combinatorial problems like these and critical for TiMDP models, in which policies also depend on current time. Therefore, the main objective of this thesis is to develop and study an approach to minimize this unwanted behavior with an approximation to a single-dependent-agent problem. In this approximation, the problem is modeled for each truck type (the mine may have trucks that dier on speed and capacity) with dependent states that represent queues with dierent sizes at shovels. Thus, the decision on which shovel must the truck travel to will depend on the states representing the current size of the queues. In this case, the policies for a specic truck dispatch can change on-the-y (real-time operating system) because of the dependence on current queue sizes; if a truck goes to a shovel and has to wait in a queue before its loading, it will indeed increase the size of the queue, this way aecting the next truck dispatching decision. Another point that must be considered are the real-time characteristics of truck dispatching in an open-pit mining. The values used to model the problem behavior are not xed and change all the time. For example, the truck travel time from a specic point to another one certainly will not be exactly the same over distinct passages. TiMDP deals very well with these characteristics, in which the actions duration can be modeled by
CHAPTER 1. INTRODUCTION probability density functions (pdfs) based on reliable historical database.
23
Given the presented truck dispatching characteristics, we investigate the dispatching system for an example mine based on a TiMDP model, verifying the validity of the method comparing its simulated results to those from other methods, namely: (1) Greedy heuristic, (2) MTCT heuristic, and (3) Genetic algorithm (GA). We then present a novel hybrid method named Genetic TiMDP (G-TiMDP), which uses the value functions given by the TiMDP model as the GA tness function. The G-TiMDP results are also compared to the results of the previous methods. For the empirical analysis, we apply the objective function of maximization of tonnage production for the whole mining shift to all modeled and simulated methods.
1.3
Work Contributions
The contributions of this thesis are presented in what follows, in the sequence in which they appear in the text. TiMDP solution by backwards convolution The TiMDP solution method is presented by Boyan and Littman (2000) and subsequent works as Li and Littman (2005) and Rachelson, Fabiani and Garcia (2009a); however these works are strictly mathematical and do not present a basic step-by-step solution example. We developed a method to solve a TiMDP by complete discretization and backwards convolution. The term backwards convolution is used to represent the step needed to solve the TiMDP model, which is performed in the reverse way of a standard convolution. Finally, we solve step-by-step a TiMDP model using our proposed method.
CHAPTER 1. INTRODUCTION Time-dependent utility decision making using the TiMDP model
24
Many common situations can be modeled with time-dependent utilities, in which specic parameters represent the gain or cost that some decision problems returns over time for the decision maker (agent). We make a correlation between time-dependent utility and TiMDP using denitions and examples, which can be useful to solve and make a better approximation to decision problems that occur in practical real-world domain settings. Single-dependent-agent truck dispatching modeling We developed an approximation for the multi-agent problem (truck dispatching) occurring in an example mine (JAOUA; GAMACHE; RIOPEL, 2009), in which the state models are built for each truck and are self-dependent in a specic common state (queueing state). We named this approximation as single-dependent-agent, which minimizes the space state size, making possible an approximated and a fast solution for truck dispatching using a TiMDP model. Real-time truck dispatching using TiMDP policies The single-dependent-agent state representation that we developed is used by the TiMDP model in the real-time truck dispatching simulation. The truck assignments (for which shovel must the truck travel to) are taken in real-time using the corrected value functions given by the TiMDP solution (policies). The TiMDP is solved before the simulation, making the assignment decisions extremely fast. Real-time truck dispatching using a GA We used SimEventsTM (package of MatlabTM ) to simulate the truck dispatching in the example mine during a 10 hour shift, and developed a novel technique that uses a GA in
25
real-time for the shovel-truck assignments. This optimization algorithm is very fast and seems to be suitable for real-time applications. Because of the uncertain parameters that are present in the truck dispatching model, the sequence of trucks asking for dispatch until the end of the shift becomes impossible to be predicted. Therefore, the algorithm is executed many times during the whole shift, seeking for result improvement (maximization of the tonnage production). Real-time truck dispatching using G-TiMDP The truck decision results given by the TiMDP policies are in general adequate, but are degraded by the approximation made by the single-dependent-agent model. Therefore, we developed a novel technique, named G-TiMDP, that uses the corrected value functions given by the TiMDP to feed the tness function of the previous developed GA technique. We performed a Monte Carlo simulation using this novel technique and evaluate the superiority of this method comparing the results by means of a Students t-test comparison.
1.4
Thesis Outline
Chapter 2 presents an introduction to the TiMDP, which is the main model used in our truck dispatching algorithm development. We present the state-of-the-art and current research on TiMDPs, and propose a solution method that can be used in various discrete applications. We also solve an example using this method and propose the use of TiMDP models for problems with time-dependent utilities. The truck dispatching problem in open-pit mining is presented in Chapter 3. In order to position the complexity and details of the truck dispatching problem, we rst review
26
the general vehicle dispatching problem with some variants and applied solution methods. Following, we present the specicities of the truck dispatching problem, such as involved equipments, specic goals, and dispatching strategies that are used in real-world truck dispatching problems. Dispatching strategies are presented, which are the basis for the developed solution methods presented in the next chapter. We present in Chapter 4 real-time truck dispatching methods for open-pit mines operating with production policy (maximize the tonnage production over the shift). We model this problem by using the concept of single-dependent-agents for TiMDPs and G-TiMDPs. We also present additional techniques for truck dispatching that are used in further analysis, namely: greedy heuristic, MTCT (Minimizing Truck Cycle Time) heuristic, and GAs. Chapter 5 presents simulation results and analysis of the developed dispatching methods. This includes details of the Monte Carlo simulations and comparison of results using the Students t-test. Some analysis on improving the method are also made here. The conclusions of the work are presented in Chapter 6. We also present some recent trends in MDP modeling and make propositions for future work. For reference, we also present overviews of Genetic Algorithms in Appendix A and relevant statistical distributions in Appendix B.
2 Time Dependence in Decision Processes

Consider the following problem: an accident have occurred, three people are injured, there is only one doctor (agent) who can only give medical care for a single person at any time, their lives are dependent on medical care, What does the doctor do? Consider still that the injury level can be dierent for each person and there are uncertainties on life maintenance after medical care. Analyzing these parameters, it is almost obvious that the right decision on the attendance sequence could maximize the probability of life savings. Decision theory is often claimed as the right framework for producing the most rational choice (PARSONS; WOOLDRIDGE, 2002), and it can be the basic theory to solve this practical and common sequential decision problem. In fact, sequential decision problems have been tackled very intensively in the last few years, and it is well known that the theoretical framework based on MDPs is the best way to model and solve them, giving optimal results in many cases (BOUTILIER; DEAN; HANKS, 1999). However, real-world problems have an additional and specic parameter, which is time dependency. MDP theory only considers xed time steps between epochs that can be easily understood and modeled as iteration steps. To avoid this limitation,
CHAPTER 2. TIME DEPENDENCE IN DECISION PROCESSES
28
Semi-MDPs (SMDPs) (SUTTON; PRECUP; SINGH, 2000), and, more recently, Timedependent MDPs (TiMDPs) have been proposed1 . In those models, the transition between states is not instantaneous, but instead takes a specic time t (durative action). In a TiMDP, time is observable, so the agent can wait the best moment to make the decision (or execute the action in the current state). For the SMDP, the problem can be modeled in innite time horizon and there is a time duration probability for the durative action, that is, the agent cannot decide to wait for the best moment to execute the action. A TiMDP also has likelihood time-dependent functions that activate the action outcome for the current time, and always models nite time horizon problems (the decisions are made between a starting and ending clock marks). In the TiMDP model, the rewards related to the action outcomes can be also represented as time-dependent functions. In the accident scenario, the person lifetime, dened as a utility for decision problems (RUSSELL; NORVIG, 2009), decreases over time and can be formally understood as a time-dependent utility (HORVITZ; RUTLEDGE, 1991). This problem can be modeled as a TiMDP, in which time-dependent utilities can be directly represented by time-dependent rewards in the model. This is only one application that can be modeled as a TiMDP problem. Other instances like vehicle routing and scheduling problems with time window constraints (SOLOMON, 1987; ICHOUA; GENDREAU; POTVIN, 2003; JI, 2005) can also be modeled as TiMDPs. The following sections present an introduction to MDPs and TiMDPs as technical basis for modeling the problem considered in this thesis, namely truck dispatching in a open-pit mine.
We use the TiMDP representation introduced by Rachelson, Fabiani and Garcia (2009b) instead of the original one, TMDP introduced by Boyan and Littman (2000), to avoid confusion with other representations such as tree-structured MDPs (LENGYEL; DAYAN, 2007)
1
29
2.1
2.1.1
Markov Decision Processes

MDP formulation
The Markov Decision Process (MDP) (PUTERMAN, 1994; BERTSEKAS, 1987; PELLEGRINI; WAINER, 2008) is a stochastic system modeling technique, in which the transitions between states are probabilistic, the states are observable and it is possible to interfere with the system dynamics through actions that produce state changes and rewards. A process is Markovian if it follows the Markov Property: the eect of an action depends only on the action itself and on the current state of the system. The decision aspect is found in the fact that the agent can periodically take decisions on the system, using actions. Formally, a MDP is a tuple (S, A, T, R) as follows:
S is the set of possible states of the system; A is a set of actions that can be executed in dierent decision epochs; T : S A S [0, 1] is a probability function for the system changing to state s S, from state s S and agent action a A, denoted by T (s |s, a); and R : S A R is the reward by taking the decision a A when the system is in state s S.
Considering that the system is at some state s in a given decision epoch k, it is necessary to select which action a must be executed. The action is selected following a decision rule, and the mapping of actions to states following the decision rules is the policy (). Given a policy, we can calculate the expected utility (or the expected total reward)
30
of the taken action sequence. The expected total reward, considering immediate reward r and for a nite horizon z is
z1 k=0
rk . (2.1)
We can also dene the discounted expected reward for nite horizon z,
z1 k=0
k rk , (2.2)
which uses a discount factor ]0, 1[ to ensure a bounded value for the expected total reward in the case of innite horizon: E lim
z
z1 k=0
k rk . (2.3)
The importance of decisions taken in future epochs is governed by the discount factor ; a value zero gives no importance to future rewards (greedy behavior), whereas a value one gives no discounts in the cumulative expected reward. A policy is optimal ( ) when the expected total reward for any state is maximized. The value function V (s) gives the optimal expected total reward value for the optimal policy :
V (s) = max R(s, a) + aA s S
T(s |s, a)V (s ) . (2.4)
The action-value function Q (s, a), for a given policy , gives the value of action a in state s, considering the immediate reward from the execution of a in s and the expected
CHAPTER 2. TIME DEPENDENCE IN DECISION PROCESSES total reward thereafter:
31
Q (s, a) = R(s, a) + s S
T(s |s, a)V (s ) .
(2.5)
For an optimal policy , we can dene Q (s, a):
Q (s, a) = R(s, a) + s S
T(s |s, a)V (s ) .
(2.6)
The optimal policy produces the optimal actions that return the maximum Q values for each state s:
(s) = arg max Q (s, a) .

aA
(2.7)
Notice that V (s) can also be represented based on the maximum Q value in the state s:
V (s) = max Q (s, a) . aA
(2.8)
2.1.2
MDP solution
The solution of a MDP is an optimal policy that produces the value function V (s) for all states. A successive approximation algorithm to solve a MDP, called Value Iteration (ALG. 1), was presented by Bellman (1966). The stopping criterion of ALG. 1 for an error Error : is dened by the so-called Bellman
CHAPTER 2. TIME DEPENDENCE IN DECISION PROCESSES Algorithm 1: Value iteration Input: MDP(S,A,T,R) Output: V* foreach s S do V0 (s) maxaA R(s, a); end i 1; while stop criteria not satised do foreach s S do Vi (s) = maxaA R(s, a) + end ii+1 ; end return V ;
32
s S T(s |s, a)Vi1 (s ) ;
s S, |V(s) V (s)|
(1 ) . 2
(2.9)
The policy iteration algorithm, which is more ecient than value iteration (converges in less iterations), was proposed by Howard (1960). This algorithm (ALG. 2) alternates between a value determination step (current policy execution), and a policy improvement step (current policy improvement). Algorithm 2: Policy iteration Input: MDP(S,A,T,R) Output: Initialize randomly repeat ; s S, V (s) = R(s, (s)) + s S T (s, (s), s )V (s ); foreach s S do a A, Q (s, a) R(s, a) + s S T (s, a, s )V (s ); end foreach s S do (s) arg maxaA Q (s, a); end until = ; return ;

a1 a2 s1 a2 s2
33
a4 s3
a3
a4
FIGURE 2.1 Example of a MDP with 3 states. TABLE 2.1 Transition Probabilities a1 (s2, 1) a2 (s1, 0.9); (s2, 0.1) a3 (s3, 1) a4 (s1, 0.4); (s2, 0.6)
s1 s2 s3
2.1.3
A MDP example
A 3-state MDP is presented in FIG. 2.1. Tables 2.1 and 2.2 present the transition probabilities and rewards, respectively, for this example. We solve this MDP example using Value Iteration (ALG. 1) and, in order to do the method demonstration, we present the rsts iterations for = 0.9:
V0 (s1) = 10 V0 (s2) = 7 , V0 (s3) = 4 (2.10)
TABLE 2.2 Reward a1 10 a2 5 a3 7 a4 4
s1 s2 s3
34
Q1 (s1, a1) = R(s1, a1) + 0.9[T (s2|s1, a)V0 (s2)] Q1 (s1, a1) = 16.3 V1 (s1) = Q1 (s1, a1) V1 (s1) = 16.3
Q1 (s2, a2) = R(s2, a2) + 0.9[T (s1|s2, a2)V0 (s1) + T (s2|s2, a2)V0 (s2)] Q1 (s2, a2) = 13.7 Q1 (s2, a3) = R(s2, a3) + 0.9[T (s3|s2, a3)V0 (s3)] , Q1 (s2, a3) = 10.6 V1 (s2) = max[Q1 (s2, a2), Q1 (s2, a3)] V1 (s2) = 13.7 (2.11)
Q1 (s3, a4) = R(s3, a4) + 0.9[T (s1|s3, a4)V0 (s1) + T (s2|s3, a4)V0 (s2)] Q1 (s3, a4) = 11.4 V1 (s3) = Q1 (s3, a4) V1 (s3) = 11.4
35
Q2 (s1, a1) = R(s1, a1) + 0.9[T (s2|s1, a)V1 (s2)] Q2 (s1, a1) = 22.3 V2 (s1) = Q1 (s1, a1) V2 (s1) = 22.3
Q2 (s2, a2) = R(s2, a2) + 0.9[T (s1|s2, a2)V1 (s1) + T (s2|s2, a2)V1 (s2)] Q2 (s2, a2) = 19.4 Q2 (s2, a3) = R(s2, a3) + 0.9[T (s3|s2, a3)V1 (s3)] . Q2 (s2, a3) = 17.3 V2 (s2) = max[Q1 (s2, a2), Q1 (s2, a3)] V2 (s2) = 19.4 (2.12)
Q2 (s3, a4) = R(s3, a4) + 0.9[T (s1|s3, a4)V1 (s1) + T (s2|s3, a4)V1 (s2)] Q2 (s3, a4) = 17.3 V2 (s3) = Q1 (s3, a4) V2 (s3) = 17.3
The convergence of iterations ( = 0.001) are shown in FIGS. 2.2a and 2.2b for = 0.9 and = 0.3, respectively. The nal results for = 0.001 are presented in the following Tables 2.3 and 2.4 for
= 0.9, and Tables 2.5 and 2.6 for = 0.3. We can note the dierence between the policies for state 2 in both solutions. In the case of = 0.3, the agent gives less importance for future states and tends to execute the

a)
Value Iteration (Gamma = 0.9)
80
36
70 s1 s2 s3
60
50
V(s)
40
30
20
10
0 1
17
25
33
41 49 Iteration
57
65
73
81
88
b)
Value Iteration (Gamma = 0.3)

13 12 11 10 9 s1 s2 s3
V(s)
8 7 6 5 4 1
3 Iteration
FIGURE 2.2 Value Iteration for (a) = 0.9 and (b) = 0.3. TABLE 2.3 MDP Solution ( = 0.9) State 1 2 3 V (s) 75.1 72.4 70.1 Policy (action) a1 a2 a4
TABLE 2.4 Q(s, a) ( = 0.9) a1 75.1 a2 72.4 a3 70.1 a4 70.1
s1 s2 s3
CHAPTER 2. TIME DEPENDENCE IN DECISION PROCESSES TABLE 2.5 MDP Solution ( = 0.3) State 1 2 3 V (s) 12.7 9.2 7.2 Policy (action) a1 a3 a4
37
TABLE 2.6 Q(s, a) ( = 0, 3) a1 12.7 a2 8.7 a3 9.2 a4 7.2
s1 s2 s3
action that returns the highest immediate reward. In the other example ( = 0.9), the agent considers the future rewards given the transition probabilities.
2.2
Time-dependent Markov Decision Processes
Time-dependent MDPs (TiMDPs) were rst proposed by Boyan and Littman (2000) to model and solve sequential decision problems with the following attributes:
Stochastic state transitions; and Stochastic time-dependent action durations.
Formally, a TiMDP consists of the following components:
CHAPTER 2. TIME DEPENDENCE IN DECISION PROCESSES S A M Discrete space state Discrete action space Discrete set of outcomes, each of the form = s , T , P : s S: the resulting space T {ABS,REL}: species the type of the resulting time distribution (absolute or relative) P (t )(if T = ABS): pdf over absolute arrival times of P ()(if T = REL): pdf over durations of L R
38
L(|s, t, a) is the likelihood of outcome given state s, time t AND action a R(, t, ) is the reward for the outcome at time t with duration
The TiMDP model is represented by the following Bellman equations2 :
V (s, t) = maxaA Q(s, t, a) Q(s, t, a) =

M
L(|s, a, t).U (, t) , (2.13)
U (, t) =
P (t )[R(, t, t t) + V (s , t )]dt (if T = ABS)
P (t t)[R(, t, t t) + V (s , t )]dt (if T = REL)
where U (, t) is the utility of outcome in time t, V (s, t) is the time-value function for the immediate action, and Q(s, t, a) is the expected Q time-value over outcomes. We can note that the calculations of U (, t) are convolutions of the result-time pdf P with the lookahead value R + V . The likelihood function L represents the probability of an outcome occurring for action a in time t, and can be used to model problems with
The equation 2.13 diers from original one dened in Boyan and Littman (2000) on not having dawdling, that is, the agent does not receive a reward for waiting in a state. Several works like Li and Littman (2005) and Marecki, Topol and Tambe (2006) use the same formulation proposed herein.
2
CHAPTER 2. TIME DEPENDENCE IN DECISION PROCESSES time-windows (BRESINA et al., 2002).
39
This model is used to solve time-dependent problems with nite time horizon and represents an undiscounted continuous-time MDP.
2.2.1
Discrete solution for relative time distributions by backwards convolution
In the general TiMDP model (BOYAN; LITTMAN, 2000), the time-value functions for each state can be arbitrarily complex and therefore impossible to represent exactly. The TiMDP problem is solved by representing R and V as a piecewise linear (PWL) function, L as a piecewise constant (PWC) function, and P discretized. This representation ensures closure under the convolutions and avoids an increased number of iterations. This solution is fast and exact (for the approximated functions), but there are the following drawbacks: loss of information caused by the initial approximations, insertion of new breakpoints in the piecewise functions over iterations, and need for an analytic solution of the convolution integral. Li and Littman (2005) explored the practical solution of value iteration considering that P is now a PWC function. This way, the degree of convoluted functions would grow up during the iterations, making impossible its solution in a reasonable time. To prevent this behavior, Li and Littman (2005) introduced the Lazy Approximation Algorithm, in which the resultant PWL function of the convolution is approximated to a PWC function on each iteration. Hence, the imprecisions and state space augmentation introduced by discretization of P is avoided in this solution method. In a recent work performed by Rachelson, Fabiani and Garcia (2009a), the related
40
functions of the TiMDP model are represented by piecewise polynomial (PWP) functions. In order to limit the degree growing of the iteration results, the introduced algorithm executes, when needed, a decreasing step, reducing the degree of the results in the current iteration by PWP interpolation. In order to simplify the solution algorithm and focus on the proposed dispatching problem, we propose the discretization of all involved functions in the model and solution of the convolutions by a discrete numerical method. This approximation does not provide a solution as fast as the original one, but it is an easier and direct way to solve problems with few states. The only problem here is that the convolution present in the TiMDP model is not solved as conventional convolution integral. A conventional convolution integral can be represented by:
h(t) =
g(t )k(t t )dt .
(2.14)
The discrete formulation of a convolution is,
h(j) = k(j) g(j) =

i
g(i)k(j i) .
(2.15)
This convolution involves a delay represented by the k function over the g function. However, in the TiMDP there is a negative delay, and the convolution integral is now,
h(t) =
g(t )k(t t)dt .
(2.16)
We characterize it as a backwards convolution, and its discrete solution is,
41
h(j) = k(j) g(j) =

i
g(i)k(j + i) .
(2.17)
So, using our solution method, the time-value function V for relative P is,
V (s, t) = max
aA M
L(|s, a, t) P (t) [R(, t) + V (s , t)] .
(2.18)
For discretized problems with absolute time distributions, the integral of Eq. 2.13 can be solved by numerical methods such as the Newton-Cotes Rule (THISTED, 1988).
2.2.2
A TiMDP example
The example presented in FIG. 2.3 is a good starting point to understand value iteration in TiMDPs. The problem is composed by two states, one action per state, constant rewards (R) over time t, and an unitary probability function (L) over all time horizon. In this case, at State 1 the agent will receive reward R1 = 1 after one time period (the action is durative and takes exactly one time period), going to State 2. In State 2, the agent will receive a reward R2 = 2 after two time periods. The rewards can be cumulated until the end of the time horizon. The system starts with time-value function V equal to zero for both states. Then, the problem is solved by value iteration using Bellman equations (eq.2.18) with our approximation presented in Section 2.2.1. The value iteration process converges at the sixth iteration, and the solution of V gives important information for agent decision making. For example, when the agent is at State 2 at time 2 it knows that can receive an accumulated reward of 6 units following

p1 1 p2 1
42
R1 1
R2 2
0 V1
1 2 3 4 5 6 7 8 9 10 V2
1 2 3 4 5 6 7 8 9 10 V1
t
V2
Iteration 1
t t
Iteration 4
0 1 2 3 4 5 6 7 8 9 10 V1 0 1 2 3 4 5 6 7 8 9 10 V2
6 4 3 1 0 V1 1 2 3 4 5 6 7 8 9 10
6 5 3 2
0 1 2 3 4 5 6 7 8 9 10 V2
Iteration 2
3 1 0 V1 1 2 3 4 5 6 7 8 9 10
3 2
Iteration 5
t
7 6 4 3 1 0 V1 9 7 6 4 3 1 0 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
8 6 5 3 2
0 V2 5
1 2 3 4 5 6 7 8 9 10
0 V2 9 8 6 5 3 2
1 2 3 4 5 6 7 8 9 10
Iteration 3
4 3 1 0 1 2 3 4 5 6 7 8 9 10
3 2
Iteration 6
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
FIGURE 2.3 TiMDP example solved step-by-step by value iteration. the policy. In this case, it can wait until time 3 and receive the same cumulated reward. So, for TiMDPs, policies are dependent both on state and current time.
2.3
Time-dependent utilities
An agent needs a measurement value to select the best option (or to make a decision) among others. This measurement is the value of the utility function (LI; SOH, 2004). This value is also called, in decision theoretic planning, value function (cumulated rewards in sequential decision making) (BOUTILIER; DEAN; HANKS, 1999). The expected utility (EU) can be calculated for problems with nondeterministic actions (RUSSELL; NORVIG, 2009):
43
EU (A|E) =
i
P (Resulti |E, Do(A))U (Resulti (A)) ,
(2.19)
where Resulti (A) are the possible outcome states for a nondeterministic action A, E summarizes the agents available evidence about the world, and Do(A) is a proposition informing that action A is executed in the current state. This common utility representation may not be used in complex real problems, in which actions to be executed are durative and have priorities. Often, it is necessary to solve more urgent tasks and to leave others in wait (BASTOS; RIBEIRO; SOUZA, 2008). For solving this question, time-dependent utility theory (HORVITZ; RUTLEDGE, 1991) can be used. In this theory, the utility is a function of time, greater than zero, and can be increasing or decreasing.
2.3.1
Decreasing time-dependent utility function
Decreasing functions can be used to represent a task lifespan and give some idea of priorities to the decision maker. For example, there are two injured people that must receive medical care by the only doctor present in a scenario. They have dierent injury levels and will die if do not receive medical care as soon as possible. So, the doctor needs to take a right decision in the attempt to save both lives, choosing which person to attend rst. This decision could be made easily, for this simple example, if the doctor has a time-dependent utility function representing the importance of a person life (that is, the death risk) in the current time. This function must map important information like age, life decreasing rate, injury level and so forth, to a utility value (this mapping is not the focus of this work, and it is assumed known by the decision maker). Therefore, the
44
right doctor decision is the one that executes the right attendance sequence, considering durative actions, without the utility function reaching a zero value (death). The decreasing time-dependent utility function can be represented by any decreasing function, but for functionality and simplicity we use exponential or linear functions for its representation:
U (A, t) = U (A, to ) ek1 t , U (A, t) = U (A, to ) k2 t, U (A, t) 0 where U(A,t) is the utility for choosing action A at time t, to is the initial time, and k1 and k2 are parameters for adjusting the exponential and linear functions, respectively, for the problem requirements. (2.20)
2.3.2
Increasing time-dependent utility function
Increasing functions can be used for instance to represent prots along time. For example, sometimes it is interesting to choose the task execution sequence based on greater rewards, as is the case for the vehicle refueling problem, in which the utility of the refueling state increases over time. Thus, as the fuel level decreases, the utility of refueling increases, and after a certain time and depending on the current position of the vehicle (distance from the fueling station), the refueling decision will be taken. Unlike the decreasing utility function that has a minimum value (zero in the most of the cases), in this case it is reasonable to assume a maximum value. For vehicle refueling in particular, it is important to agree upon a maximum utility value that will refer to an empty tank. The utility model is
45
U (A, t) = Umax (1 ek3 t ) , U (A, t) = U (A, to ) + k4 t, U (A, t) 0 where U (A, t) is the utility for choosing action A at time t, to is the initial time, Umax is the maximum utility, and k3 and k4 are parameter constants for adjusting the exponential and linear functions, respectively, for the problem requirements. (2.21)
2.3.3
A time-dependent utility example
In this section, we present a more complex sequential decision making example using time-dependent utilities (or rewards varying over time) modeled and solved by a TiMDP. The example is presented in FIG. 2.4. It has three states, two selectable actions per state, a nite horizon with limit of 100 time periods, unitary likelihood function over all time horizon, and deterministic action durations. The problem was solved by value iteration using Bellman equations with our approximations (eq. 2.18). The results for the time-value functions V and Q are presented in FIGS. 2.5, 2.6 and 2.7. In the graphics, we have the time-value function V (State, t), which is the maximum between the Q(State, Action, t) time functions. This solution follows the same idea presented in section 2.2.2, with the dierence that the agent cannot wait in the state for the best decision making time. The solution is hard to analyze (even for just three states and six actions), and it shows the need and importance of TiMDP models for solving large time-dependent problems. FIG.2.8 shows the policies depending on the time. Such policies dene the actions
46
FIGURE 2.4 Sequential decision making problem using time-dependent utility.
FIGURE 2.5 Value function - V (1, t).
47
48
FIGURE 2.8 Policies over time. that the agent must choose based on the maximum Q value for a state, in time t. For example, if the agent is at State 3 and the current time is 77, it must choose Action 6, therefore moving to State 2. In the TiMDP model, actions can be durative and uncertain (represented by pdfs). We used a Normal Distribution to represent P1 in our example, with mean 10 and variance 3. Normal distribution are very convenient for this kind of problem, in which the action is durative and with dierent durations over executions. For real situations and with a reliable database of past action durations, a Normal distribution is a good approximation for the action duration pdf, because it tends to cluster around a single mean value with the proper variance. The solution for State 1 is shown in FIG. 2.9. Comparing this result with the original problem (FIG. 2.4), it is clear that the function Q(1, 1, t) becomes smoother. There is also a change in aspect for function Q(1, 2, t). In fact, these changes in the function may change the overall policies due to the uncertainty
49
FIGURE 2.9 Value function V (1, t), P 1 = N (10, 3). in action durations that is related to inherent variances. The policies are shown in the FIG. 2.10. Comparing to FIG. 2.8 we note a dierence between the policies, that is caused by the uncertainty added in the duration of Action 1. For example, now the policy in State 1 at time 25 is Action 2, against Action 1 in the original problem. The uncertainty added to the action duration that belongs to State 1 has also caused a changing in the policy for State 3. Therefore, it is very important to model correctly the pdfs in order to avoid wrong decisions.
50
FIGURE 2.10 Policies over time - P 1 = N (10, 3).
3 Truck Dispatching in Open Pit Mines

Truck dispatching in open-pit mining consists of material (mineral matter) transportation during a shift by haul trucks from pickup stations (shovels) to delivery stations or dump points (crushers, waste dumps or stock piles). The mineral matter is composed by (KOLONJA; KALASKY; MUTMANSKY, 1993) ore (the most valuable mineral product), leach (of marginal, but positive value), and waste (of no value). A mine is often composed by dierent models of trucks and shovels (heterogeneous eet), that work at specic and dierent truck speeds and capacities and shovel digging rates. Under truck driver solicitations, the dispatcher (or eet manager ) must decide in realtime which shovel must the truck travel to (truck assignment) based on the current mine state and on a decision support system or on own experience. These decisions have crucial importance in the mining operation, given that material transportation is one of the most important aspects of open-pit mine operations, representing up to 60% of operating costs (ALARIE; GAMACHE, 2002). Due to its signicance, several decision systems for this problem have been developed in the last few years, improving productivity and reducing operational costs.
CHAPTER 3. TRUCK DISPATCHING IN OPEN PIT MINES
52
In the following sections similarities between truck dispatching and other vehicle dispatching systems are presented; the truck dispatching in open pit-mining is fully addressed and detailed.
3.1
Vehicle dispatching problems
The truck dispatching problem does not occur only in Mining, and can be found in any area that includes management of a vehicle eet. Some examples of vehicle dispatching problems are:
Dynamic vehicle assignment problem (POWELL, 1988)
This is a common problem in the shipping industry. Given a request, the eet manager must decide which truck will be sent to the ship for loading and further delivering. After the delivering, if there is not more loadings, the truck must be repositioned given future loading demands.
Dial-a-ride (GENDREAU; POTVIN, 1998)
It is a generalization of the dynamic vehicle assignment problem. During a day, a vehicle must pickup and deliver material (or people) in dierent locations. This problem can have some capacity restrictions and soft time-window constraints. The objective is doing all transportation with minimum costs.
Automated Guided Vehicles (AGVs) in the manufacturing industry (CO; TANCHOCO, 1990)
53
AGVs, or mobile robots, do the material transportation in a shop oor (raw material or nished product) in an automated plant. The transportation occurs in close locations and there are predened robot waiting places to avoid queues in the processes. Alarie and Gamache (2002) relate that truck dispatching in open-pit mining seems to be a simplication of the other vehicle dispatching problems; however, it presents some characteristics that are not commonly reported in the literature:
Mines are closed systems, that is, the pickup and delivery points remain the same and stay at the same position during a long period of time (generally, a shift of 8 to 12 hours); The traveling distances are short comparatively to the length of the shift (10 to 25 min); The frequency of demands at each pickup point is high (each 3 to 5 minutes); and If the size of the eet is too large, truck queues may appear.
Additionally, we cite the high combinatorial aspect of the problem due to several trucks typically working in a mine (the dispatching system must considers the position of all trucks on its assignment to the shovels, which is exemplied by values in the next chapter considering our example mine model). In the simulated mines presented by Jaoua, Gamache and Riopel (2009), there are 15 trucks in a medium-scale mine (3 shovels and 2 dump points), and 60 trucks in a large scale mine (10 shovels and 3 dump points). In Computer Science, the truck is an agent and this problem is modeled as a multi-agent system.
54
The number of trucks (eet size) working in a mine is dened in a previous decision epoch by a specic optimization technique, which is not the focus of this work. Situations with more trucks than the optimal quantity (over-trucked ) will increase the length of queues at shovels, while less trucks (under-trucked ) cause shovel underutilization. So, the results of our algorithm are strongly inuenced by the quantity of trucks operating in a shift, that must be close enough to the optimal quantity.
3.2
Truck dispatching problem
Solving a truck dispatching problem in open-pit mining can signify maximizing tonnage production (productivity policy), minimization of equipment inactivity (truck waiting time and shovel idle time), or Run of Mine (ROM) attendance (quality policy). In a mine, the ROM is the quality level of the ore that can be a combination (balanced mean) of many mining fronts. Pinto (2007) developed a Fuzzy Algorithm to simultaneously nd a balanced result using both production and quality policies. Therefore, to obtain the best results, the problem is divided in two upper stages (KRAUSE; MUSINGWINI, 2007): (1) truck resource allocation or eet size estimation, and (2) real-time truck dispatching. The eet size estimation, which is not the focus of this thesis, is a very important issue to be tackled in the truck dispatching problem; over-trucked situations will increase the length of queues at shovels, whereas under-trucking cause shovel underutilization (ALARIE; GAMACHE, 2002). The costs in an over-trucked mine are increased because of higher truck utilization causing more maintenance stops and higher fuel consumption, whereas the production objectives will not be attained in an under-trucked mine. Due to its importance, this issue is tackled by many recent works in the mining literature.
55
Brahma (2007) used Queueing Theory (GROSS, 2008) and Petri Nets (MURATA, 2002) to nd the optimal number of trucks in the context of a shovel dumper (haul truck) combination system; Krause and Musingwini (2007) used a modied Machine Repair Model for estimating the truck eet size; Ta et al. (2005) used a chance-constrained stochastic optimization approach in heterogeneous truck eet resource allocation, accommodating uncertain parameters such as truck load and cycle time; Huang et al. (2010) used a Genetic Algorithm to optimize the number of trucks in an open-pit mine minimizing the cost of truck transportation and maintenance; and Souza et al. (2010) developed a hybrid metaheuristic algorithm (Greedy Randomized Adaptive Search Procedure and General Variable Neighborhood Search) to minimize the number of mining trucks used to meet production goals and quality requirements. The real-time truck dispatching stage can be modeled by three strategies (ALARIE; GAMACHE, 2002): (1) 1-truck-for-n-shovels, (2) m-trucks-for-1-shovel, and (3)m-trucksfor-n-shovels.
3.2.1
The 1-truck-for-n-shovels strategy
This is the most used strategy in the mining industry. Trucks are assigned one by one to shovels (FIG. 3.1). The eet manager assigns the truck to the shovel that is most suitable to the current dispatching criterion, following a heuristic method (ALARIE; GAMACHE, 2002), or rule (TA et al., 2005). Heuristics are procedures which are not mathematically proven but which are based upon practical or logical operating procedures (RUSSELL; NORVIG, 2009). The most used heuristic methods used in truck dispatching are (KOLONJA;
56
FIGURE 3.1 1-truck-for-n-shovels strategy. KALASKY; MUTMANSKY, 1993; CETIN, 2004):
Minimizing Shovel Waiting Time (MSWT): an empty truck in the dispatching point is assigned to the longest idle time shovel, or to the shovel that expects to be idle rst. The objective of this criterion is to maximize the utilization of both truck and shovels. Minimizing Truck Cycle Time (MTCT): the goal of this strategy is to assign an empty truck to the shovel that allows the shortest truck cycle time, maximizing the total tonnage productivity. The objective of this criterion is to maximize the number of truck cycles during the shift. Minimizing Truck Waiting Time (MTWT): in this criterion, an empty truck in the dispatching point is assigned to a shovel in which the loading operation starts rst. The objective of this criterion is to maximize the utilization of a shovel by minimizing its waiting time.
57
Minimizing Shovel Saturation or Coverage (MSC): empty trucks are assigned to the shovel at equal time intervals to keep a non-idle shovel operation. The objective of this rule is to assign the trucks to the shovels at equal time intervals to keep a shovel operating without waiting for trucks.
This strategy is myopic (or greedy) because the system is not completely observed when a truck is being dispatched. For example, in a two shovel and two truck mine, the rst truck positioned at the dispatching point is assigned to the shovel number one, because of its higher production, and the second one must have to be assigned to the shovel number two (this example system does not allow queues in the mining). In this situation, the total production, following the production policy, will not be the maximum one. Thus, the global result (sum of individual truck productions) is aected because of the greedy behavior of this strategy. Nevertheless, Lizotte and Bonates (1987) and Tu and Hucka (1985) used this strategy in their works.
3.2.2
The m-trucks-for-1-shovel strategy
In this strategy (FIG. 3.2), the shovels are rst sorted following a priority scheme (e.g., by how much they are behind schedule on their production), and then, each one selects, from a list of m trucks, the one that best serves it (e.g., the truck with highest load capacity and the nearest one). Alarie and Gamache (2002) relate that there is only one implemented system that use this strategy, namely the DISPATCHTM commercial package for truck dispatching, which is developed by Modular Mining Systems. As DISPATCHTM is a commercial package, no substantial information about its algorithms and heuristic methods are found in the scientic literature.
58
FIGURE 3.2 m-trucks-for-1-shovel strategy.
3.2.3
The m-trucks-for-n-shovels strategy
This strategy (FIG. 3.3) considers simultaneously the m available trucks for dispatching and the n shovels present in the mine. This is a combinatorial problem that can be modeled as an assignment problem or as a transport problem. Elbrond and Soumis (1987) solves the truck dispatching as an assignment problem. Here, the system considers for the assignment optimization the truck that asks for dispatching and the next 10 to 15 trucks that will ask for dispatch in the near future (e.g. over the paths, nishing dumping or nishing material loading). Only the assignment of the current asking truck is answered, other assignments are discarded. The system will repeat the same steps in the next dispatching solicitations. The solution is only for the near future dispatching trucks because of the combinatorial explosion of this problem, that is, NP-hard (PAPADIMITRIOU; STEIGLITZ, 1998). In fact, a solution considering the whole shift would be extremely time consuming, and impracticable for a real-time system.
Higher Priority
59
FIGURE 3.3 m-trucks-for-n-shovels strategy. The system proposed by Temeng, Otuonye and Frendewey (1997) is modeled and solved as a transport problem. In this problem, each supply center is associated to a truck that will be dispatched in a near future, and each receiver center is a shovel present in the mine. The receiver center demand is expressed as the number of trucks needed to reach the production goals. The cost of sending a truck to a shovel is given by the truck waiting time (truck queues at the shovels). Another current trend in solving this kind of problem is the Evolutionary Algorithm (EA), which uses some mechanisms inspired by biological evolution: reproduction, mutation, recombination, and selection. This is a near optimal algorithm, that is, the global optimal solution is not guaranteed to be found and the algorithm often converges to local optimal solutions (the EAs have specic search mechanisms to avoid a premature convergence to rst local optimal solutions). A near optimal solution is generally found must faster by the AGs than exact searching methods (e.g. breadth-rst search), and can be considered acceptable given the convergence criteria of the algorithm. Some related
Next k Dispatched Trucks
60
techniques are: Genetic Algorithm (GA) (MITCHELL, 1998), Particle Swarm Optimization (PSO) (SHI; EBERHART, 2002), and Simulated Annealing (SA) (KIRKPATRICK, 1984). Jaoua, Gamache and Riopel (2009) used SA as an optimization algorithm applied to the truck dispatching in a simulation-based real-time control.
4 Truck Dispatching Modeling

The truck dispatching in open-pit mines is a problem in which decisions on truck assignments and destinations are taken in real-time. As in many other real-world applications, the assessment and correct modeling of uncertainty is a crucial requirement, as the unpredictability originated from equipment faults, weather conditions and human mistakes can often result in truck queues or idle shovels. There are also uncertainties in travel and loading times related to the problem; the travel time of a truck between the same specic loading and dumping points certainly will not be the same over the whole shift, and can be represented by a probability density function (pdf). Therefore, this problem can be classied as a stochastic problem, in which the uncertainties must be part of the problem model and be considered in the problem solving process. However, uncertainty is not considered in most of current dispatching systems, possibly providing worse solutions than the average optimal one. Consider the following example: two identical trucks are parked in the same area, just waiting to be assigned to two identical shovels. Considering that queues are not allowed, Which shovel must each truck travel to? The answer is quite obvious because of truck homogeneity: each truck must travel to a dierent shovel (there will be no dierence in total production). This simple example shows the easiness of solution in simple environments; even if the shovels were dierent, the solution remains the same.
CHAPTER 4. TRUCK DISPATCHING MODELING
62
S1
2 2
S2
4 7 S3
FIGURE 4.1 Abstract graph of a medium-scale mine. However, in most real situations the mines operate with heterogeneous trucks and shovels, queues are allowed, and the dispatching requisitions do not occur simultaneously. Given the stochasticity, mining objectives, heterogeneous eet and queueing characteristics present in a mine, What would be the best technique to solve this real-time dispatching problem? It is known that this hard problem is not fully addressed and solved by current systems. We present in the following sections a realistic example of a medium-scale mine, which is the testbed for some models that deals with the real problem characteristics. These models will be the basis for the simulations and analysis presented in the next chapter.
4.1
4.1.1
A model for a medium-scale mine example

Mine environment
In order to have a testbed for the simulations of the proposed truck dispatching algorithms, we present a modied medium-scale mine example (FIG. 4.1), which was rst introduced by Jaoua, Gamache and Riopel (2009).
CHAPTER 4. TRUCK DISPATCHING MODELING TABLE 4.1 Truck specications.
63
Truck Type 1 2 3
Quantity 10 3 2
Empty Aver. Speed 50 km/h 48 km/h 40 km/h
Loaded Aver. Speed 40 km/h 37 km/h 35 km/h
Payload Capacity 200 t 300 t 400 t
The mine has three pickup stations (shovels S1 , S2 , S3 ), and one delivery/departure station (crusher C). It diers from the original one considered in Jaoua, Gamache and Riopel (2009) for the absence of one waste dump (another delivery station) and one departure station (truck parking area and starting point of truck dispatching). For the sake of simplicity and because of our main objective (i.e., to introduce a novel real-time stochastic truck dispatching system), we reduced the number of elements in the original mine. Certainly, our proposed truck dispatching systems (section 4.2) can be used with few modications in larger and more constrained mines.
4.1.2
Specifying trucks and shovels
In the same manner as in the original mine, we use 15 trucks for the material transportation from shovels to crusher. Unfortunately, nothing is reported in Jaoua, Gamache and Riopel (2009) about trucks and shovels specications. In order to overcome the deciency of the previously introduced model, we propose heterogeneous types of trucks (Table 4.1) and shovels (Table 4.2) operating in the mine environment. Therefore, performance comparisons can be made against dispatching algorithms already developed or proposed in the future.
CHAPTER 4. TRUCK DISPATCHING MODELING TABLE 4.2 Shovel specications. Shovel 1 2 3 Average Loading Rate 40 t/min 20 t/min 100 t/min
64
4.1.3
The truck cycle
Truck dispatching is executed following a Truck Cycle: an action sequence with its related timespan. Basically, the sequence is: (1) the truck receives a dispatching order at the departure station (crusher in our model), (2) it then travels through a path to the assigned shovel, (3) loads the material, (4) returns to the crusher through a path (that can be dierent from the rst one), (5) unloads the material, and (6) waits for another dispatching order. This sequence is repeated until the end of the shift. The truck cycle must be adapted to a state-based representation, which is the basis for the methods presented in section 4.2. In order to complete the representation of actions, timespan, and queue at the shovels, we represent shovels and crushers by sub-states (FIG. 4.2). The truck cycle in a state-based representation follows the sequence:
1. The truck starts its cycle at Crusher (state C ) being assigned to a Shovel (state S ), and then executing the action move_shovel that takes the timespan t shovel (which depends on the distance from crusher to shovel and on the empty truck average speed); 2. At state S, the truck moves (action move_queue) to the FIFO (rst in rst out) queue state, that takes the timespan t queue (depends on the size of the queue); 3. When the truck is the rst one in the queue, it is loaded (action load_truck) by the Shovel (state S ) in timespan t load (depends on shovel loading rate and truck
CHAPTER 4. TRUCK DISPATCHING MODELING capacity);
65
4. Then, the truck must move to the Crusher (state C ) (action move_crusher) in timespan t crusher (depends on distance from shovel to crusher and loaded truck average speed); 5. Finishing the cycle, the truck unloads (action unload_truck) the material in the Crusher (state C ) in timespan t unload (based on truck capacity). For the sake of simplicity, we consider that there is no queue at the crusher; the trucks unload the material collected from the shovels in a concurrent manner. Moreover, in our model the queues at the shovels are limited to 9 trucks (the dispatching system controls considers this size limitation on assignments, and we consider that the truck driver follows strictly its shovel assignment). In order to make the presented system suitable to a TiMDP modeling, times are related to the actions, not to the states; e.g. the time that the truck waits in the queue (t queue) (which is related to the action move_queue) depends on the current size of the queue. The estimated truck cycle time can also be delayed because of prohibitions of truck overtakes. Thus, if a truck is behind a slower truck, it may have a travel delay changing the estimated travel time. This drawback is one of many issues that occurs in a real-world mine, and indeed causes a decrease in the quality of dispatching heuristics.
4.1.4
Mine uncertainties
We introduce two kinds of uncertainty to the mine model, approximating its behavior to a real-world mining: (1) stochastic path selection, and (2) Gaussian-based truck

Shovel
move_queue load_truck
Q t_load
66
S'
t_queue
Crusher
t_shovel t_crusher
C
move_shovel
t_unload
C'
move_crusher
unload_truck
FIGURE 4.2 Truck cycle time. traveling times.
4.1.4.1
Stochastic path selection
Path selection is related to action (shovel assignment) outcomes (). First, at the departure station (dispatching point), the truck driver receives from the dispatcher the information of the shovel that it must travel to. As we do not consider the routing problem, the truck driver must select the best path to the shovel based on own experience and/or depending on the actual trac/weather conditions. The same stochastic characteristic occurs in the return travel (from shovel to delivery/departure station). In order to be applied in the TiMDP model (section 4.2.3), the outcomes are classied depending on the shovel assignment. The truck driver can select 3 paths for the traveling; by default, we use 1 for the shortest path, 2 for the medium path, and 3 for the longest path. FIG. 4.3 shows the outcome classication for each travel between crusher and shovels (forward and return travels). In order to represent a real-world mine operation behavior, we dene that the selected path for the forward travel (empty truck) is not necessarily the same as the return travel (full truck). Given the truck assignment, the probability of an outcome occurrence (which path the truck will follow in) is based on a likelihood function over the whole shift (FIG. 4.4).
67
a) S1
1
b) S1
2
S2
2
S2
1
S3
3 3
S3
c) S1
3
S2
2
S3
FIGURE 4.3 Path selection outcomes. (a) Crusher-Shovel 1-Crusher; (b) Crusher-Shovel 2-Crusher; (c) Crusher-Shovel 3-Crusher
68
Hence, path selection occurs based on a probability value that can vary over time, but of course the sum of outcome probabilities always equals one. The likelihood may be obtained based on historical data; herein, for the sake of simplicity we used arbitrary values and likelihood functions that are valid for all truck types. As an illustrative example, in FIG. 4.4a, when the truck is assigned to Shovel 1, the probability of the driver taking path 1 is 85%, 2 is 10%, and 3 is 5%, from time 0 to 300 minutes and also from time 360 minutes to the end of the shift. These probabilities only change between times 300 to 360 minutes, in which 1 is zero, 2 is 60%, and 3 is 40%. This abrupt change in probability values occurs because of a programmed maintenance and resulting blocking of the path between C and S1 in the aforementioned period. Therefore, we introduce a novel constraint in the modeling of truck dispatching in open-pit mining problems, namely time windows, used before in vehicle dispatching problems (SOLOMON, 1987). The introduction of this constraint in the model approximates the problem to real-world mining, in which path blockages often occur.
4.1.4.2
Gaussian-based truck traveling times
Trucks assignment in the mine is a cyclic operation, in which they are constantly executing material transportation between shovels and crusher until the end of the shift. Each truck movement or operation, represented in FIG. 4.2, takes a timespan depending on distance, truck speed, truck and shovel capacities, and queues size. These times can be attributed based on historical mine database, being these representations the basis for the presented real-time dispatching methods (Section 4.2), hence their importance in our model.
69
FIGURE 4.4 Outcome likelihood functions.
70
Certainly, the truck displacement timespan between two identical points will not be the same over travels. Minor variations can be explained based on dierent drivers that conduct trucks with similar, but not equal, speeds and throttles, and small dierences on shovel and crusher positions. Major variations are based on high reduction of truck speed because of weather conditions, and changes in mine conguration. In this thesis, our solution methods consider only the minor timespan variations, which are represented by a probability distribution function (pdf). We use a Normal (or Gaussian) distribution for timespan representation, which is a convenient model to represent time processing. However, due to only positive representations of time, this distribution may not be a good choice in some cases because of its theoretical range ( to +). In this case, we can use the Gamma Distribution, which have range from zero to + and is often used to represent the time required to complete some task. The graphical representation of the Gamma Distribution is similar to the Gaussian in situations in which the values tend to zero in the negative time axis. Gibson and Bruck (2000) and Ludwig (1996) also considered the involved times in their problems as Gamma Distributions. Both distributions formulations are found in Appendix B. For the sake of simplicity and because of sucient time representation (for the considered times in the presented example) considering the positive range, we use the Gaussian Distribution for time travel representation. We present in Table 4.3, for the example mine (FIG. 4.1), the required times for the truck travelings. The mean time of travels are given by the distances and full and empty truck speeds; the standard deviations are arbitrarily dened. Since our objective concerns the comparison among truck assignment methods, only in a few occasions we considered standard deviations dierent from zero. We consider that these data are necessary and sucient conditions to compare the dispatching
CHAPTER 4. TRUCK DISPATCHING MODELING TABLE 4.3 Mining data.
71
methods. Our work only consider pdfs for truck traveling times; the other timespans present in the mine operation (t queue, t load, and t unload ) are considered deterministic.
72
4.2
Truck dispatching methods
Using the introduced mine environment (FIG. 4.1), we propose ve methods to solve the truck dispatching problem: Greedy Heuristic, MTCT Heuristic, TiMDP, Genetic Algorithm (GA), and Genetic TiMDP (G-TiMDP). The methods Greedy Heuristic, MTCT (Minimizing Truck Cycle Time) Heuristic and TiMDP follow the 1-truck-for-n-shovels strategy, whereas GA and Genetic TiMDP follow the m-truck-for-n-shovels strategy. All presented dispatching methods are implemented in order to maximize tonnage production.
4.2.1
Greedy heuristic
We have shown in section 3.2 that the 1-truck-for-n-shovels strategy is greedy. Indeed, this strategy can be considered as such because the truck assignment is made observing only its own state; it is an egotist behavior that leads to not so good global results. However, most of the methods applied according to this strategy are fast and have some knowledge about the mine environment, leading to acceptable results considering the realtime and uncertain aspects of the problem. Thus, due to the acceptable quality of the results presented by these heuristic methods (such as the MTCT heuristic), we propose an extremely greedy heuristic that certainly will return poor results, which will be used for comparisons with other methods. In this method, the dispatcher does not have much information about the mine environment. Crucial informations for a good dispatching, like distances and truck/shovel capacities are completely unknown and not considered by the dispatching algorithm. The only observation that is allowed is the size of the queues at the shovels. Thus, in this method the truck must be assigned to the shovel that presents the smallest queue. Be-
73
cause of the balanced dispatching characteristic, the size of the queues tends to be near equal during the whole shift; the problem of shovel underutilization is not present in this method. Likewise, the time-window in the shift, which indicates the blocking period of the nearest path to the crusher, is not considered in this method. Since the only information for the heuristic is the size of the queues, the knowledge about the time-window does not aect the performance of the method. Another important issue that occurs in the dispatching is the instant of decision; the rst decision diers from the others, because in the beginning all trucks are available and waiting for its shovel assignment. Considering that trucks cannot overtake each other in the paths, we organize a decision queue, in which the fastest trucks are placed rst in order to prevent trac slowness. As a special case for further decisions in which trucks asks for dispatching at the same time, the fastest trucks always have the preference. This decision policy used for cases with conicting trucks will also be used for our methods based on 1-truck-for-n-shovels strategy.
4.2.2
MTCT heuristic
The main objective in our proposed mine is the transportation of the maximum quantity of material by the trucks during the shift. Thus, a good dispatching heuristic would be the minimization of the truck cycle times, in order to maximize the number of truck travels. We apply this heuristic (MTCT heuristic) to the mine allowing full observation, which means that the dispatcher knows how to calculate the cycle times, and have enough
74
information for doing it. However, we assume determinism even though dispatching occurs in a stochastic environment. The dispatcher considers that the trucks travel to the shovels always using the shortest path (outcome 1 ), and does not consider the Gaussian aspect of the time of travelings (the mean time is considered for all dispatches). This deterministic assumption in a stochastic environment may not lead to dispatching with suciently near-optimal results. In order to improve the performance of this method, we considered knowledge about the time-window to estimate the truck cycle time. During the time-window, the heuristic considers that the truck takes the medium path to travel and return from Shovel 1 (taking a longer time). Since the trucks have dierent payload capacities, the time wait in the queues must be estimated. We use the mean loading time, which is the time that the shovel takes to load a 300 tons truck.
4.2.3
TiMDP Model
We propose modeling of the truck dispatching problem as a TiMDP, in which characteristics as uncertainties, related times, and quantity of transported material can be addressed. The solution of this model will return policies which dene the best action to be executed by the agent given the current time and its current state, that is, the dispatcher must verify the policies given by the TiMDP to decide on the truck assignment. However, dispatching problems with many agents modeled as variations of Markov decision processes (MDPs) can generate an exponential state space augmentation that causes a correspondingly drastic increase of the necessary time (and memory) to nd the
75
optimal solution. This issue is known as the curse of dimensionality (PUTERMAN, 1994) and can be very serious in problems like these and critical for TiMDP models, in which policies also depend on current time. Therefore, some approximations must be performed to minimize this problem, in an attempt to get a feasible solution to the dispatching problem. We present in this section an approximation for the TiMDP model by single-dependent agents that reduces signicantly the size of the problem. Since the dispatching decisions (presented in Chapter 5) are taken in real-time based on TiMDP policies solved previously, we also present some results and analysis of our proposed mine TiMDP model.
4.2.3.1
Single-dependent-agent TiMDP modeling
The state representation of the mine involves the places where trucks can be located, that is, crusher, shovels and queues, and paths. We consider that paths are not states, but transitions between states. Thus, the number of states is highly decreased because the truck is traveling from one location to another location, and not at a position of a path that would be discretized. However, for a complete state representation, the states considering all trucks should be considered interdependent, which results in a huge state space. In a mine with only one location and two trucks (A and B), the location can be associated to 4 states: no trucks, only truck A, only truck B, and both trucks. In our example, not considering time and queue sizes, the complete state space has order 1011 . To solve a problem of this size we have to use approximate solvers such as APRICODD (STAUBIN; HOEY; BOUTILIER, 2001), which is based on SPUDD (HOEY et al., 1999), an exact factored MDP (BOUTILIER; DEARDEN; GOLDSZMIDT, 2000) solver. However,
76
in our formulation there are two other issues that increase the state space: time and queue size. We consider 10 hours shift, and a discretization step of 0.1 minute for the Gaussian representation of timespans; the queues have the maximum size of 9 trucks, and due to the heterogeneous eet the truck order in the queue must be considered. Such considerations enlarge the space state to order 1021 , making the problem impossible to be solved in a reasonable time. In order to solve a problem with such huge number of states, we propose the approximation of the multi-agent problem to an introduced single-dependent-agent problem, in which a solution is generated for each agent (on a small state space) with some states that add dependencies on other agents. In this model, the actions are executed concurrently (MAUSAM; WELD, 2004) by the agents; some actions can be executed at the same time (each agent execute one action per time), however, because of the dependency model, the execution of other actions is dependent on the current position of the other agents in the environment. These dependencies are important to the quality of the solution because they insert another dependency for the decision in a specic state. Now, the policy is dependent on the own agent current state, current time, and other states that depend on the other agents present in the environment. Naturally, this approximation does not return optimal results, however the results provide evidence of good performance for solutions that are returned in short time (as shown in the next chapter). In our example mine we have the state transitions representation illustrated in FIG. 4.5 for each truck present in the environment, in which FIFO queues (Q1, Q2, and Q3) are represented by slot buers for the shovels (S1, S2, and S3, respectively). The dependencies of this representation are addressed by the queue; the current size of a queue is set based on observations exchanged among the agents. Considering that a specic truck is at S1
77
ready to execute the action move_queue, its position in the queue will be governed by the current positions of the other trucks present in the same queue. Therefore, considering that the only decision state is C, the observation of the current position of the other trucks (mainly in the queues) is essential for good results for truck dispatching. The single-dependent-agent representation seems to be appropriate for TiMDP modeling; an agent decision can be made observing its own current state, current time, and the position of the other agents in the environment. However, the actual queue representation (as a slot buer) is not appropriate for the TiMDP model, and must be mapped to a state representation. As the queues are limited to 9 trucks, we propose their representation by a set of states (for each queue), in which each state represents the quantity of trucks in the queue (varying from zero to 9 trucks). Our example mine is thus approximated to a single-dependent-agent model (FIG. 4.6), which is an expansion of the presented truck cycle (FIG. 4.2) considering queue sizes, and outcomes over path selection by the truck driver. The only action that gives a reward is unload_truck. Its value is the quantity of tonnage transported by the truck. The policies of the TiMDP model aim at maximizing the expected tonnage that can be transported by the truck, considering the whole shift. The action that makes the transition from state S to its queue Q is move_queue. In the TiMDP model, the queue is represented by a set of states in which the transitions are produced by independent actions (e.g. move_queue_Q1_2). However, in the dispatching instant, the truck moves to the queue independently of its size. To solve this problem of action representation, we propose a two-phased TiMDP model applied to real-time queueing problems that are represented by single-dependent-agents:
78
Q1 move_queue load_truck
S1'
S1
move_shovel_1
1 2 3
move_crusher
move_shovel_2
1 C 2 3 S2'
Q2 move_queue load_truck move_crusher
S2
3 2 1
move_shovel_3 move_crusher
Q3 move_queue load_truck
S3'
S3
unload_truck
FIGURE 4.5 Truck dispatching state transitions.
79
move_queue
load_truck
Q1_0
move_shovel
S1' 1 2
Q1_1
S1
move_crusher
Q1_9
3 C
move_queue load_truck
Q3_0
2 1 S3'
move_shovel
Q3_1
move_crusher
S3
Q3_9
unload_truck
FIGURE 4.6 TiMDP truck dispatching states.
CHAPTER 4. TRUCK DISPATCHING MODELING 1. Solve the complete TiMDP model (o-line phase).
80
2. Find the optimal dispatching policy (dispatching phase), which is subdivided into: (a) Execute a Value Iteration algorithm step (on-line sub-phase). (b) Assign the truck to a shovel (assignment sub-phase).
The o-line phase is solved in a moment before the mine shift, whereas the dispatching phase occurs in real-time during the shift. In the o-line phase, the system is modeled following the representation presented in FIG. 4.6, that is, the action move_queue is represented for each queue state. For each new action that produces the transition from a state S to states Q (representing the quantity of trucks in the queue), there is a specic duration t queue. The duration of the action move_queue depends on the size of the queue (|Q|) and on the mean time (t) of truck waiting in the queue:
t queue = t |Q| .
(4.1)
An initial approximation for the mean time t is
t load T 1 + t load T 2 + t load T 3 , t= 3 where T1, T2, and T3 are the truck types, as presented in Table 4.1.
(4.2)
The importance of a state representing a zero number of trucks in the queue resides on the interaction between the phases of our proposed TiMDP model. In the o-line phase, the TiMDP is solved using the Value Iteration algorithm, which takes around 5
81
minutes 1 to be solved for each truck type considering a shift of 10 hours and discretization step of 0.1 min. The value function of state S is the maximum value of the convoluted action duration and value function of the queue states; considering that the action move_queue_Q0 has a null duration, the value function of S will be equal to the value function of Q0. Thus, the policy representing the dispatching decision (which is always executed at state C) will be found not considering the current size of the queues in the decision epoch. This incorrect behavior is then solved in the on-line sub-phase. The dispatching phase is subdivided into two sub-phases: on-line and assignment. The dispatching decisions start in the on-line sub-phase considering the estimates of future sizes of the queues, which are based on truck expected traveling time, number of trucks traveling to the queue, current size of the queue, and mean time (t) of truck waiting in the queue. In an example situation, the number of trucks in the queues and the other cited aspects are observed, being the dispatching decision taken in the assignment sub-phase considering the maximum value given by the TiMDP for the expected queue size of all shovels. However, the decision is always taken on dispatching state C, and the values (expected tonnage) used for the decision are valid for states Q. Therefore, in the on-line sub-phase, these values must be referred to the dispatching state by execution of one step of the Value Iteration algorithm. In this sub-phase, the other value functions of Q states, that diers from the expected size of the queue, are not considered. Thus, the expected tonnage considered at state C now refers to the expected size of the queue. In order to save time on dispatching decisions, we execute the on-line sub-phase for all sizes of queues right after the convergence of the TiMDP model in the o-line phase. Thus, the on-line sub-phase can also be solved before the shift, but it is an essential step that must be
1
Pentium Quad Core Q9400@2.66GHz, 4 Gb RAM
CHAPTER 4. TRUCK DISPATCHING MODELING executed to get a correct truck dispatching.
82
Considering that a truck asks for dispatching at C at current time td and the current size of Q1 is zero, the Q value of Q1 referenced to C is calculated in the on-line sub-phase:
Q(C, t, move shovel 1) = L(1 |C, move shovel 1) U (1 , t) +L(2 |C, move shovel 1) U (2 , t) +L(3 |C, move shovel 1) U (3 , t) The utility U is . (4.3)
U (1 , t) = t shovel 1 1 Q(S1 , t, move queue) U (2 , t) = t shovel 1 2 Q(S1 , t, move queue) . U (3 , t) = t shovel 1 3 Q(S1 , t, move queue) During the shift, in the assignment sub-phase, the selected shovel that the truck must travel to is given by the action dened by the policy , which compares the Q values of the state C (move_shovel actions): (4.4)
(C, td ) = arg maxmove
shovel (Q(C, td , move
shovel 1), Q(C, td , move shovel 2), (4.5)
Q(C, td , move shovel 3)) .
4.2.3.2
TiMDP results and analysis
The simulations presented in this section concern the o-line and on-line phases (executed before the shift) of the TiMDP model, which uses all mine data presented until then. The assignment sub-phase, which returns the nal results of this method (that is,
83
the total tonnage production), is presented in the next chapter due to the necessity of a simulation considering all trucks and executed over the whole shift. All simulations presented have results displayed in a graphical form in which data are presented by expected tonnage production (tons) versus time (minutes). In order to understand the main characteristics of the method we present a diversity of simulations combining dierent shovels, queues sizes, and phases (o-line and on-line). The dierences between the o-line (Normal TiMDP) and on-line (Dislocated TiMDP) phases are shown in FIG. 4.7 for dispatching decision of T1 with queue size at shovel 1 equal to zero 2 . We show in FIG. 4.7b the detail for the time-window (blocking of path between C and S1), in which we can observe more carefully the dierences between tonnage productions for a same instant. As commented in the previous section, the indicated value at the o-line phase represents the expected tonnage for the current size of the queue, represented by states Q; however, this value must be referred to state C, occurring in the presented dierences between phases. The time-window is represented by the rst and last discontinuities in the function, in the interval 295-355 minutes for the Normal TiMDP. The dierence in the original time-window that represents the path blockage between 300-360 minutes can be explained based on TiMDPs theory whose decisions depend on subsequent action durations. Therefore, if T1 moves from S1 to Q1 in instant 355 the truck driver will have the choice (considering that there is only one truck in the mine) to take the shortest path in the return travel, because the size of the queue is equal to zero and its loading takes 5 minutes. However, as the decisions are taken in state C, we must consider the Dislocated TiMDP function to analyze the time-window behavior. Now, we can observe a dierence in the time mark of the rst function discontinuity comparing
2 We dene the term Dislocated TiMDP based on function dislocation that the on-line sub-phase causes on the original TiMDP calculated in the o-line phase dened here as Normal TiMDP.
84
Normal and Dislocated TiMDPs. This dierence is explained by the dislocation on the function caused by the expected time that the truck might have to travel from the crusher to the shovel. The last discontinuity changes exactly to instant 360 minutes, which is the unblocking instant of the shortest path between crusher and shovel 1. The other discontinuities present in the Dislocated TiMDP can be explained based on outcome likelihood functions (FIG. 4.4) applied to EQS. 4.3 and 4.4, which are used to refer the decision to state C. We can observe another eect of the on-line sub-phase in FIG. 4.7c, which is the dislocation in time of the last value of tonnage production that diers from zero. The zero value in the function indicates that the truck should go to a parking lot, due to the time size of the shift, that is, the crusher ends it works exactly at time 600 minutes, and if a truck travels to the shovel it may (expected values) encounter the crusher out of work. Thus, in the simulations presented in the next chapter, the trucks are always sent to a parking lot if expected values of tonnage production for all shovels are equal to zero. In this example, we can observe clearly the dislocation of the TiMDP function caused by the on-line sub-phase. The next gures presented in this section refer to the on-line sub-phase. FIG. 4.8 presents the dierences between values of expected tonnage production considering all shovels and queues with size zero. We can observe in FIGs. 4.8b and 4.8c, the dierence that the values present along the shift. For example, in time around 293 minutes, the policy (dened in the assignment sub-phase and found based on the higher tonnage production value) changes from Shovel 1 to 3. This change in the policy can be explained by the time-window. The Shovel 1 returns to be the best dispatching decision in time 360 minutes. In time around 572 minutes we observe a change in the policies, which are
85
FIGURE 4.7 Expected tonnage production at crusher C (Truck 1 - Shovel 1 - Queue 0).
86
dependent on the approximating end of the shift, and the timespans in the system, such as t shovel and t load. These decisions based on expected tonnage production, current time, and queue size, are all executed in the assignment sub-phase. The dierences between the expected tonnage production for a same shovel and truck, and dierent size of queues are presented in FIG. 4.9 for T2 and Shovel 3. We can observe that the dierences remain almost the same during most of the shift (FIG. 4.9b), except for the end (FIG. 4.9c), in which the dierences are all highly dependent on the current time. Clearly, the truck should go earlier to the parking lot if the size of the queue is larger. FIG. 4.10 compares results in a more realistic behavior of the mine environment, in which the size of the queues diers from each one during the shift. We observe in the zoomed gures (FIGs. 4.10b and 4.10c) that the policies change depending on current time; before time 300 the action move_shovel_1 is better than action move_shovel_2,but it is the worst action during the period 300-333. We can note that move_shovel_3, even leading to the lengthiest queue, is a good action to be selected during most of time. This issue occurs because of the average loading rate of Shovel 3, which is 2.5 times longer than in Shovel 1 and 5 times longer than in Shovel 2. We must also note that the sizes of the queues change during all the shift, indeed modifying the policies, however, we show in this section comparisons among xed size queues just for a better understanding of the TiMDP model. Up to this point, we have shown results of TiMDP models considering standard time representations (exact durations of the actions), whereas the time in a real-world problem tends to be non exact. Let us then consider, as presented in Table 4.3, the action durations represented by Gaussian distributions. In order to show the dierences between standard
87
FIGURE 4.8 Expected tonnage production at crusher C (Truck 1 - Queue 0).
88
FIGURE 4.9 Expected tonnage production at crusher C (Truck 2 - Shovel 3).
89
FIGURE 4.10 Expected tonnage production at crusher C (Truck 2).
90
and Gaussian time representations, we present in FIG. 4.11 two graphics for the same condition, in which are considered the expected tonnage production at C for T3 and sizes of queues equal to zero at Shovels 1, 2, and 3. We can observe a smooth function for the Gaussian representation (FIG. 4.11b) compared to the standard representation (FIG. 4.11a), which can be explained based on the convolution operations of the Q functions with the discretized Gaussian representations that are used in the TiMDP solution. In order to guarantee a good solution (convergence of TiMDP solution to its near optimal values) and to limit the use of memory in simulations, we used a discretization step of 0.2 minutes 3 . The eects of the Gaussian representations can be better observed in FIG. 4.12. Comparing FIGs. 4.12b2 and 4.12b1 we observe the increase of the expected tonnage production introduced by the Guassian representations. Policies can change also due to the behavior of the Gaussian distribution; originally the selected action between times 353 and 354 was move_shovel_2 (FIG. 4.12c1), being changed to action move_shovel_1 in the Gaussian representation (FIG. 4.12c2). The smoothness from the Gaussian representation can be also observed comparing FIGs. 4.12d1 and 4.12d2, and 4.12e1 and 4.12e2. We note that all those modications are based on all combined Gaussian distributions present in the model, as shown in Table 4.3, and it can be a dicult task to predict the behavior of this type of representation due to the high number of combination of values that are executed in a TiMDP solution. Certainly, these modications are more signicant in a system with a complete Gaussian representation of all involved times.
Discretization steps smaller than 0.2 minutes caused memory overow because of usage of 32 bits operational system. Steps bigger than 1 minute returned results much dierent of results presented by standard TiMDP. We have reduced regularly the discretization steps upon 0.2 minutes observing the convergence tendency of the results and avoiding memory overow.
3
91
a)
Expected Tonnage Production at Crusher C (Truck 3 - Queue 0)

11000 10000 9000 8000 Shovel 1 Shovel 2 Shovel 3
Tonnage Production (t)
7000 6000 5000 4000 3000 2000 1000 0 0
100
200
300 Time (min)
400
500
600
b)
Expected Tonnage Production at Crusher C (Truck 3 - Queue 0 - Gauss)

11000 10000 9000 8000 Shovel 1 Shovel 2 Shovel 3
7000 6000 5000 4000 3000 2000 1000 0 0
100
200
300 Time (min)
400
500
600
FIGURE 4.11 Comparative of expected tonnage production at crusher C (Truck 3 Queue 0) for standard and Gauss representations.
92
Expected Tonnage Production at Crusher C ( Truck 1 - Queue 0)

9000 8000 7000 Tonnage Production (t) 6000 5000 4000 3000 2000 1000 0 0 8200 100 200 300 Time (min) 400 500 600
Expected Tonnage Production at Crusher C ( Truck 1 - Queue 0 - Gauss)

9000 8000 7000 6000 5000 4000 3000 2000 1000 0 0 100 200 300 Time (min) 400 500 600
Shovel 1 Shovel 2 Shovel 3
Shovel 1 Shovel 2 Shovel 3
8200
8150 Tonnage Production (t) Tonnage Production (t) 0.5 1 1.5 2 2.5 Time (min) 3 3.5 4 4.5 5
8150
8100
8100
8050
8050
8000
8000
7950
7950
7900 0 3240 3235 3230 Tonnage Production (t) 3225 3220 3215 3210 3205 3200 353 4200 4000 3800 3600 3400 3200 3000 2800 290 1100 1000 900 Tonnage Production (t)
7900 0 3240 3235 3230 Tonnage Production (t) 3225 3220 3215 3210 3205 3200 353 4200 4000 3800 3600 3400 3200 3000 2800 290
1100 1000 900 Tonnage Production (t) 800 700 600 500 400 300 200 100
0.5
1.5
2.5 Time (min)
3.5
4.5
353.1
353.2
353.3
353.4
353.5 353.6 Time (min)
353.7
353.8
353.9
354
353.1
353.2
353.3
353.4
353.5 353.6 Time (min)
353.7
353.8
353.9
354
300
310
320
330 Time (min)
340
350
360
370
300
310
320
330 Time (min)
340
350
360
370
800 700 600 500 400 300 200 100 0 520 530 540 550 560 Time (min) 570 580 590 600
0 520
530
540
550
560 Time (min)
570
580
590
600
FIGURE 4.12 Comparative of expected tonnage production at crusher C (Truck 1 Queue 0) for standard and Gauss representations.
93
4.2.4
Genetic Algorithm (GA)
All proposed methods up to this point are based on 1-truck-for-n-shovels strategy, which is egotist because of non observation of other coming trucks in the dispatching decision. In order to minimize this problem and try to improve the results, we propose the truck dispatching using Genetic Algorithm (GA), which is based on m-trucks-forn-shovels strategy. As GA theory is a well-known theme in the combinatorial problem community, we present its general theory on Appendix A. The GA technique is applied in the problem in two distinct decision instants: rst dispatching decision in the start of the shift, in which the trucks are all available, and next decisions, in which commonly one truck asks for dispatching. For the rst decision instant, we organize a queue of trucks, in which the decisions are taken starting from the rst to the last truck in the queue. The goal is then to minimize the summed truck cycle time. This method uses the MTCT heuristic in the selection phase as the tness function, being applied for a set of trucks in order to minimize its total cycle time. The tness function also considers the delays caused by trac (faster trucks can be behind slower trucks) and queues in the shovels. The GA chromosome is composed by a double array (FIG. 4.13), in which the rst array represents which shovel the truck is assigned to, and the second array represents the position of truck in the decision queue. We assume that, after the GA algorithm execution, the decision is instantly executed, incurring in a null truck waiting time in the decision queue. Given the chromosome conguration, we initialize a population composed by 2500 individuals. This number seems to be not a large quantity of individuals given the high

T#1 Shovel T#2 T#3
94
T#14 T#15
...
ST#1 PT#1
ST#2 PT#2
ST#3 PT#3
... ...
ST#14 PT#14
ST#15 PT#15
Queue Position
FIGURE 4.13 Truck dispatching GA chromosome. combinatorial characteristic of the chromosome (1019 order in our example); however the necessary time to converge to a good solution is directly related to the population size. In our mine problem, the dispatches occur in real-time, thus a long time for decisions is not acceptable in this kind of problem (because of delays inserted in the production). In our tests, using the proposed population size, we have obtained results that are close enough to the same GA formulations with bigger initial populations. Using our GA formulation, the nal results can be obtained in around one minute, which is an acceptable time in a real-time truck dispatching environment. The initial selection is executed two by two individuals (binary tournament selection), selecting the best one based on the tness function. After this phase, the population is reduced to half its original size, that is, 1250 individuals. In the next GA step, reproduction phase, we proceed with pairwise crossover of individuals (FIG. 4.14), just for the rst array of the chromosome, with a dened probability of 0.9; in order to recover the original population size, each crossover generates 4 sons. In the crossover example, rst the sons are generated as exact copies of their fathers; we adopt Sons #1 and #3 as copies of Father #1, and Sons #2 and #4 as copies of Father #2. If the crossover is accepted (given by the dened probability), the start and end genes are randomly selected and modied by the genes in the same range of the other father; e.g. Son #1 is initially an exact copy of Father #1 and the crossover indicates that its genes from T#4 to T#9 must be changed

Father #1
T#1 T#2 T#3 T#4 T#5 T#6 T#7 T#8 T#9 T#10 T#11 T#12 T#13 T#14 T#15 T#1 T#2 T#3 T#4 T#5 T#6
95
Father #2
T#7 T#8 T#9 T#10 T#11 T#12 T#13 T#14 T#15
S1 1
S2 3
S1 5
S1 4
S2 12
S3 11
S3 15
S3 6
S1 7
S2 9
S2 8
S3 10
S1 13
S1 2
S1 14
S2 7
S2 13
S1 15
S3 1
S2 2
S3 5
S1 8
S3 9
S2 4
S1 3
S2 11
S3 14
S1 12
S2 6
S3 10
Crossover 1: x1=T4; x2=T9 Son #1

Son #2
T#7 T#8 T#9 T#10 T#11 T#12 T#13 T#14 T#15
S1 1
S2 3
S1 5
S3 4
S2 12
S3 11
S1 15
S3 6
S2 7
S2 9
S2 8
S3 10
S1 13
S1 2
S1 14
S2 7
S2 13
S1 15
S1 1
S2 2
S3 5
S3 8
S3 9
S1 4
S1 3
S2 11
S3 14
S1 12
S2 6
S3 10
Crossover 2 x1=T13; x2=T15 Son #3

Son #4
T#7 T#8 T#9 T#10 T#11 T#12 T#13 T#14 T#15
S1 1
S2 3
S1 5
S1 4
S2 12
S3 11
S3 15
S3 6
S1 7
S2 9
S2 8
S3 10
S1 13
S2 2
S3 14
S2 7
S2 13
S1 15
S3 1
S2 2
S3 5
S1 8
S3 9
S2 4
S1 3
S2 11
S3 14
S1 12
S1 6
S1 10
FIGURE 4.14 Truck dispatching GA crossover.

Mutation Son #1 T3xT13
Son #2
T#7 T#8 T#9 T#10 T#11 T#12 T#13 T#14 T#15
S1 1
S2 3
S1 13
S3 4
S2 12
S3 11
S1 15
S3 6
S2 7
S2 9
S2 8
S3 10
S1 5
S1 2
S1 14
S2 7
S2 13
S1 15
S1 1
S2 2
S3 5
S3 8
S3 9
S1 4
S1 3
S2 11
S3 14
S1 12
S2 6
S3 10
Son #3
T#1 T#2 T#3 T#4 T#5 T#6 T#7 T#8 T#9 T#10 T#11 T#12 T#13 T#14 T#15 T#1 T#2 T#3 T#4
Mutation Son #4 T10xT12

T#5 T#6 T#7 T#8 T#9 T#10 T#11 T#12 T#13 T#14 T#15
S1 1
S2 3
S1 5
S1 4
S2 12
S3 11
S3 15
S3 6
S1 7
S2 9
S2 8
S3 10
S1 13
S2 2
S3 14
S2 7
S2 13
S1 15
S3 1
S2 2
S3 5
S1 8
S3 9
S2 4
S1 14
S2 11
S3 3
S1 12
S1 6
S1 10
FIGURE 4.15 Truck dispatching GA mutation. for the same genes range of Father #2. The genes in the second array of the chromosome represent the truck position in the initial decision queue, that is, its values cannot repeat along the array, therefore preventing the crossover operation. The next step of our GA dispatching model is the mutation of the second array of the chromosome (queue position), that occurs with a 0.01 probability being a random swap between genes values. In our example (FIG. 4.15), the mutation operation is executed in Son #1, swapping its genes T#3 and T#13, and in Son #4, swapping its genes T#10 and T#12. We assume that the reproduction policy is elitist, that is, if the best father is better

Father #1
T#1 T#2 T#3 T#4 T#5 T#6 T#7 T#8 T#9 T#10 T#11 T#12 T#13 T#14 T#15
96
S1 1
S2 3
S1 5
S1 4
S2 12
S3 11
S3 15
S3 6
S1 7
S2 9
S2 8
S3 10
S1 13
S1 2
S1
Total Truck Cycle = 116 [min]

14
Father #2
S2 7
S2 13
S1 15
S3 1
S2 2
S3 5
S1 8
S3 9
S2 4
S1 3
S2 11
S3 14
S1 12
S2 6
S3

10
Son #1
S1 1
S2 3
S1 13
S3 4
S2 12
S3 11
S1 15
S3 6
S2 7
S2 9
S2 8
S3 10
S1 5
S1 2
S1

14
Son #2
S2 7
S2 13
S1 15
S1 1
S2 2
S3 5
S3 8
S3 9
S1 4
S1 3
S2 11
S3 14
S1 12
S2 6
S3

10
Son #3
S1 1
S2 3
S1 5
S1 4
S2 12
S3 11
S3 15
S3 6
S1 7
S2 9
S2 8
S3 10
S1 13
S2 2
S3 14
Son #4
S2 7
S2 13
S1 15
S3 1
S2 2
S3 5
S1 8
S3 9
S2 4
S1 14
S2 11
S3 3
S1 12
S1 6
S1 10
FIGURE 4.16 Truck dispatching GA elitist behavior. than the worst son, it must take place over son if and only if its tness value is better than the sons one. In our example (FIG. 4.16), the best father is #2 (smallest truck cycle between fathers), which takes place over Son #4 because its total truck cycle is smaller than the Sons #4 one. This elitist policy assures the maintenance of the best individual of its generation, therefore allowing the convergence of the algorithm. Finally, the individuals of the reproduction operations are shown in the FIG. 4.17. These reproduction steps are applied two by two following a sequential order to all individuals in the population. Hence, the population will double its size after the reproduction of all individuals, returning to its original size. After that, the selection phase is executed again, in order to select the best individuals and reduce the population size to

Son #1
97
Son #3
T#7 T#8 T#9 T#10 T#11 T#12 T#13 T#14 T#15
S1 1
S2 3
S1 13
S3 4
S2 12
S3 11
S1 15
S3 6
S2 7
S2 9
S2 8
S3 10
S1 5
S1 2
S1 14
S1 1
S2 3
S1 5
S1 4
S2 12
S3 11
S3 15
S3 6
S1 7
S2 9
S2 8
S3 10
S1 13
S2 2
S3 14
Son #2
Father #2
T#7 T#8 T#9 T#10 T#11 T#12 T#13 T#14 T#15
S2 7
S2 13
S1 15
S1 1
S2 2
S3 5
S3 8
S3 9
S1 4
S1 3
S2 11
S3 14
S1 12
S2 6
S3 10
S2 7
S2 13
S1 15
S3 1
S2 2
S3 5
S1 8
S3 9
S2 4
S1 3
S2 11
S3 14
S1 12
S2 6
S3 10
FIGURE 4.17 Truck dispatching GA reproduction result. another reproduction phase, and start a new generation. The convergence was obtained in around 50 generations, and took less than one minute. The solution, which is the rst truck assignment, is the best individual after the problem convergence. For the next decision instants we consider that only one truck asks for dispatching per time. Now, the GA dispatching method will consider for shovel assignment the asking truck and the next m estimated trucks to arrive in state C in the next tGA time period. The shovel assignments for the future expected trucks arriving in the state C during the considered tGA will be placed in a so-called dispatching list. Another dispatching list will be only generated when the rst truck arrives at state C after the considered tGA . As this method perform the truck dispatch considering more than one truck, it is considered a m-trucks-for-n-shovels strategy. Now, the chromosome (represented as the rst array of the previous chromosome presented in FIG. 4.13) is dened considering observations on trucks being loaded at shovels and unloaded at the crusher, waiting in the queues, and traveling through the paths. Some genes may indicate zero, representing that the truck was not observed (its arrival time on state C cannot be estimated) and it will not be considered in the GA algorithm for that dispatching decision. In order to estimate the truck cycle, we insert an auxiliary array to the chromosome
98
T#1 Estimated arrival time
T#2
T#3
...
T#14
T#15
taT#1
taT#2
taT#3
...
taT#14
taT#15
FIGURE 4.18 Auxiliary chromosome array. (FIG. 4.18), which only indicates the estimated arrival time of the truck (ta ) on state C, not being used in crossovers or mutations. Therefore, the GA algorithm is executed considering the current dispatching truck, and the next trucks that are expected to dispatch in the next tGA minutes. The estimated arriving times of trucks on state C depend on observations of their current states. However, we face some specic characteristics of dispatching simulator that dicult the estimation of arriving times, such as impossibility of observing the truck when traveling through the paths and the position of a specic truck in any queue 4 . These issues and the stochastic behavior of the problem (uncertainty on selecting the traveling paths) add some imprecisions on trucks arriving time, which may imply in results that diers from the previewed by the GA algorithm. In this case, when a truck arrives at state C before tGA and it is not at the dispatching list, a dispatch heuristic method (such MTCT) must be executed in order to perform the shovel assignment. Certainly, this situations will degrade the quality of the general GA method. In order to minimize this problem, we limit the maximum considered time for chromosome construction (tGA ) based on current truck observations and estimation of arrival times at state C. This time limitation is presented by the ALG. 3, in which tGA is found based on estimated trucks arrival times on state C. In the algorithm, T S1, T S2, and T S3 are the set of estimated truck arrival times of observed trucks being loaded and at rst and second position in queues on
We added the possibility of observing the rst and the second trucks in the queue in order to improve our results.
4
99
shovel 1, shovel 2, and shovel 3, respectively; tcurrent is the current shift time got in the dispatching GA decision. In the t max S calculation, it is added a constant, that is, the loading time of the smallest truck on each shovel. The tGA is basically the smallest maximum arrival time at crusher of considered trucks. Therefore, the trucks composing the chromosome must have their estimated arrival time at state C between tcurrent and tGA . This approximation added more GA dispatching executions, however the quality of the results was considerably improved due to the drastic reduction of heuristic dispatches. Algorithm 3: Calculation of maximum truck arrival time in state C Input: CALC TGA(T S1,T S2,T S3,tcurrent ) Output: tGA t max S1 max(T S1) + 5 ; t max S2 max(T S2) + 10 ; t max S3 max(T S3) + 2 ; t max a min(t max S1, t max S2, t max S3) ; t max tcurrent ; foreach t s1 T S1 do if t s1 t max a AND t s1 t max then t max t1 ; end end foreach t s2 T S2 do if t s2 t max a AND t s2 t max then t max t2 ; end end foreach t s3 T S3 do if t s3 t max a AND t s3 t max then t max t3 ; end end tGA t max ; return tGA ;
The GA is always started when the rst truck arrives in state C after the previously calculated tGA following the previous shown steps (rst dispatching decision) with small modications because of dierences on the current chromosome construction. As the chromosome is formed by only one array indicating the trucks positions, the mutation
100
phase, that was executed in the decision queue position, is now executed on the trucks positions, following the previous dened procedures. The tness function follows the previous one, which is used for minimizing the total cycle time; however, now considering the estimated truck arrivals times (ta ) to nd the cycle time for each truck represented in the chromosome. As the number of shovels attended (indicated by the chromosome construction) is dependent on the observed trucks, the size of the population used in the GA algorithm will be dependent on the chromosome conguration. We have adjusted the population size in order to converge to good results in short time (less than one minute) due to the real-time dispatching best practices.
4.2.5
G-TiMDP
The introduced TiMDP model for truck dispatching seems to be a good representation for this problem because of its specic characteristics, such as: stochastic behavior (the real-world problems are often uncertain), sequential decision making (the accumulated reward, or value function, considers the expected results of all sequential actions during the whole shift, hence the reward can be considered just for one action in our problem the action unload_truck), time-dependent decisions (time-windows and variations on outcomes over time can be easily considered). However, due to single-dependent-agent approximation, the model follows the 1-truck-for-n-shovels strategy, that is, the dispatching decisions are egotist leading to not so good results. In order to improve the results, we introduce the Genetic TiMDP (G-TiMDP), which is a hybrid algorithm that combines the sequential decisions in uncertain environments of the TiMDP with the combinatorial characteristic of GA, leading to a new m-trucks-for-n-shovel method. The G-TiMDP diers from the GA model (presented in the last subsection) only in
101
the selection phase, in which the tness function is evaluated based on maximization of the Expected Tonnage Production that is given by the proposed TiMDP model. Following the TiMDP phases, in this hybrid dispatching method the o-line and on-line phases remain calculated as previously, providing its results to a new assignment phase, which is now performed by the GA dispatching method. As the TiMDP results in the cited phases are found before the mine shift and the shovel assignments resulted from the GA method (such as the previous one), the dispatching time of G-TiMDP remains the same of the pure GA method, that is, less than one minute.
5 Simulations and Analysis

We dened in the last chapter some truck dispatching methods that are applied to our example mine: (1) Greedy heuristic, (2) MTCT heuristic, (3) TiMDP model, (4) GA model, and (5) G-TiMDP model. In order to test the performance of these methods, we developed a simulation framework based on example mine data, such as shovels characteristics and positions, trucks characteristics, present uncertainties, shift length, and queue size limitations. The dispatching methods were evaluated by Monte Carlo simulation, being their results compared using Students t-test providing enough data for further quality analysis.
5.1
Simulation Framework
The proposed dispatching methods were developed and simulated using the software SimEventsTM (a MatlabTM package). All simulations follow the characteristics of the proposed mine environment example, being executed during a 10 hour shift. The objective of the simulations is to compute the total tonnage production in the end of the shift considering a eet composed by 15 heterogeneous trucks as already proposed. A general mine simulation environment is presented in FIG. 5.1. The dispatching methods use the same simulation framework, except for specic functionalities, shown in the next subsections.
CHAPTER 5. SIMULATIONS AND ANALYSIS
103
Referring to FIG. 5.1, the trucks are treated as entities, which are generated in the Truck Generator block with their specic characteristics (based on truck type), which will dene the traveling times along the paths and the quantity of transported material. After that, at time zero (started by the Start Timer block), the trucks are positioned following a priority scheme (faster trucks are positioned rst; GA and G-TiMDP methods follow the priority based on the rst dispatching decision dened in Sections 4.2.4 and 4.2.5) in the Priority Queue block. The TiMDP block is specic for TiMDP and GTiMDP methods and is responsible for getting the results of the real-time phase from the workspace in the current time t. Some entity attributes that indicate important informations for dispatching decisions, such as size of the queues at the shovels, are set in the Set Attribute block with current data from the environment. The dispatching decision of all methods considering their specic characteristics is taken in the Shovel Decision Function block. After the shovel assignment, the truck travels to the Shovel and then (after material loading) goes to the Crusher. The time period during which the truck stays unloading at the crusher is calculated in the Crusher Time Function block. The quantity of material transported in a cycle is added to the total tonnage production by the Simout - Tonnage Transported block.
FIGURE 5.1 General mine simulation environment.
104
105
FIGURE 5.2 Shovel 1 block simulation environment detail. The Shovel block is actually a group of blocks (FIG. 5.2). First, the truck must read the current time t (Read Timer block), which will be important for the time-dependent outcome (path selection). The path (Paths 1 block) is selected randomly in the Likelihood Function 1 block. After traveling through the path, the truck arrives in the shovel going rst to the Queue 1 block. Then, when the truck is at the rst position of the queue and the shovel is idle, the truck loads material at the Shovel 1 block during the timespan given by the Shovel 1 Time Function block. The truck must return to the crusher by the Path 2 block, in which the outcome path is dened by the Likelihood Function 2 block considering the current time given by the Read Timer 2 block. In order to get important data (rst and second trucks in the queue) for the Calculation of maximum truck arrival time in state C (ALG. 3) as required for both GA and G-TiMDP methods, the Queue needs to be segmented (FIG. 5.3). The rst and second positions in the queue are represented by Single Server 1 and Single Server 2 blocks, respectively. As the trucks have the possibility of randomly choosing the paths (outcomes) from
106
FIGURE 5.3 Queue 1 block simulation environment detail. the crusher to the shovels (or vice-versa), the Path 1 block is represented by all possible paths (FIG. 5.4). Blocks Path 1, Path 2, and Path 3 represent the 1 , 2 , and 3 outcomes, respectively. Our example mine environment has an important security constraint that is the prohibition of truck overtakes. This constraint is added in the simulation framework as shown in the FIG. 5.5, in which the rst truck in the path only releases the behind truck to continue along the simulation after its arriving in the shovel.
5.2
Dispatching Methods Behavior
The proposed dispatching methods were simulated in the developed framework and presented dierent behaviors related to tonnage production and queue formations. In order to show an initial result, we have simulated all methods using the same seed for the randomly selection of the traveling paths, which is the sequence of path outcomes will be the same for all simulations, independently of methods and trucks. The rst simulated method is the Greedy Heuristic method, for which we show the quantity of trucks in shovels along the shift (FIG. 5.6). FIGs. 5.6a, 5.6b, and 5.6c show
107
FIGURE 5.4 Paths 1 block simulation environment detail. the quantity of trucks in the queues and traveling to Shovel 1, Shovel 2, and Shovel 3, respectively. Because overtakes are not allowed, all presented dispatching methods consider that the queues in the shovels are formed by stopped trucks waiting for material loading and by traveling trucks. We note that the mean quantity of trucks in all queues tends to be the same along time. Indeed, because of the greedy behavior, trucks travel to the shovel with smaller queues, producing this balancing. In this simulation the system does not know about the existence of a time-window and the trucks do not go to the parking lot at the end of the shift. This specic simulation returned a total tonnage production of 77 000 tons. The quantity of trucks in paths going to Shovel 1 is shown in FIG. 5.7. The outcomes 1 , 2 , and 3 are shown in FIGs. 5.7a, 5.7b, and 5.7c, respectively. The presented graphics indicate only the quantity of trucks traveling in the paths going to Shovel 1; the quantity of trucks returning to the Crusher, and using the same path to go to another
108
FIGURE 5.5 Path 1 block simulation environment detail.

a)
Quantity of Trucks - Shovel 1 - Greedy
5
109
Quantity of Trucks
0 0
100
200
300 Time (min)
400
500
600
b)
5
Quantity of Trucks
0 0
100
200
300 Time (min)
400
500
600
c)

5 4.5 4 3.5
Quantity of Trucks
3 2.5 2 1.5 1 0.5 0 0
100
200
300 Time (min)
400
500
600
FIGURE 5.6 Quantity of trucks in shovels for the Greedy Heuristic simulation.
110
shovel is not considered here (we consider that the paths for the shovels, despite being the same in the real problem, are dierent and are considered independently with proper outcomes). The time-window (the 1 blockage between times 300 and 360 minutes) is clearly shown in FIG. 5.7a, in which no truck is allowed to travel through path 1. The likelihood function is also represented in the graphics by the higher usability of the outcomes 1 , 2 , and 3 , sequentially. In the MTCT heuristic the trucks are dispatched according to the minimum truck cycle, which is directly related to the shovels loading rates and consequently the size of their queues. This behavior makes the mean sizes of the queues dier from each other, which can be explained by the shovels loading rates. Indeed, fastest shovels can attend more trucks in the same timespan, that is, more trucks can be sent to those shovels, consequently leading to a larger queue. FIG. 5.8 shows this behavior, in which the highest mean size of the queue is for Shovel 3 (FIG. 5.8c), followed by Shovel 1 (FIG. 5.8a), and then Shovel 2 (FIG. 5.8b). This dispatching heuristic knows about the path blockage (time-window), but does not know about the likelihood function; it considers that the trucks always travel through the shortest available path, that is, outcome 1 during the whole shift, except during the time-window in which the outcome is 2 . The total tonnage production for this simulation was 88 900 tons. In the MTCT heuristic simulation, trucks must go to the parking lot (FIG. 5.9) when it is not possible to complete a cycle until the end of the shift. In fact, due to uncertainties present in the problem, such as time in the queues and path outcomes, some trucks are dispatched and do not return to the decision point (Crusher) until the end of the shift. This problem can be bypassed by considering the addition of a constant in the calculated cycles, but this could worsen the results.

a)
4
111
Quantity of Trucks - u1 - Shovel 1 - Greedy
Quantity of Trucks
0 0
100
200
300 Time (min)
400
500
600
b)
2
Quantity of Trucks
0 0
100
200
300 Time (min)
400
500
600
c)

2
Quantity of Trucks
0 0
100
200
300 Time (min)
400
500
600
FIGURE 5.7 Quantity of trucks in paths going to Shovel 1 for the Greedy Heuristic simulation.

a)
5
112
Quantity of Trucks - Shovel 1 - MTCT
Quantity of Trucks
0 0
100
200
300 Time (min)
400
500
600
b)

2
Quantity of Trucks
0 0
100
200
300 Time (min)
400
500
600
c)

8
Quantity of Trucks
0 0
100
200
300 Time (min)
400
500
600
FIGURE 5.8 Quantity of trucks in shovels for the MTCT Heuristic simulation.

Trucks on Parking Lot - MTCT
15 14 13 12 11 10
113
Truck Number
9 8 7 6 5 4 3 2 1 575 580 585 Time (min) 590 595 600
FIGURE 5.9 Trucks on parking lot for the MTCT heuristic. The quantity of trucks in shovels for the TiMDP model is shown in FIG. 5.10. It is hard to notice a dierent behavior from the last dispatching method. However, due to likelihood knowledge and consideration of the sequential decision, the tonnage production is slightly better: 90 200 tons. The parking lot occupation for the TiMDP model is shown in FIG. 5.11. Due to uncertainties in the problem (e.g., time in the queues, and outcomes of move_shovel action), not all trucks go to the parking lot until the end of the shift. The GA model for truck dispatching is based on the MTCT heuristic, and likewise it does not assume knowledge about the likelihood function for path selection. Moreover, its assumptions are the same regarding the time-window. However, dispatching is made considering the sequence of trucks going to the Crusher in the next time tGA , which leads to better results. The quantity of trucks in shovels for this model is shown in FIG. 5.12. An interesting result is that the mean quantity of trucks in Shovel 1 (FIG. 5.12a) does not decrease during the time-window. Independently of this behavior, the results of this

a)
4
114
Quantity of Trucks - Shovel 1 - TiMDP
Quantity of Trucks
0 0
100
200
300 Time (min)
400
500
600
b)

3
Quantity of Trucks
1 0 0
100
200
300 Time (min)
400
500
600
c)

10 9 8 7
Quantity of Trucks
6 5 4 3 2 1 0 0
100
200
300 Time (min)
400
500
600
FIGURE 5.10 Quantity of trucks in shovels for TiMDP model simulation.

Trucks on Parking Lot - TiMDP
14
115
12
10
Truck Number
2 582
586
590 Time (min)
594
598
600
FIGURE 5.11 Trucks on parking lot for TiMDP model. method are better than those for MTCT; its total tonnage production is 90 300 tons 1 . The parking lot occupation for the GA model is shown in FIG. 5.13. The quantity of trucks in the shovels for the G-TiMDP model is presented in FIG. 5.14. Again, it is hard to identify substantial changes when comparing to the other presented methods. An interesting aspect is that Shovel 2 is used more times during the shift, with a peak of 4 trucks on it. The results of this m-trucks-for-n-shovels method were the best among all we tested: a production of 90 600 tons. The parking lot occupation for the G-TiMDP model is shown in FIG. 5.15.
We note that this result is even better than the one presented for the TiMDP model, however it is a specic result based on considered paths outcomes. Based only on this result, we cannot claim that this method is superior to TiMDP. A complete and statistically sound comparison is presented in the next section.
1

a)
4
116
Quantity of Trucks - Shovel 1 - GA
Quantity of Trucks
0 0
100
200
300 Time (min)
400
500
600
b)
3
Quantity of Trucks
0 0
100
200
300 Time (min)
400
500
600
c)
9 8 7 6 5 4 3 2 1 0 0
Quantity of Trucks
100
200
300 Time (min)
400
500
600
FIGURE 5.12 Quantity of trucks in shovels for the GA model simulation.

Trucks on Parking Lot - GA
15 14 13 12 11 10
117
Truck Number
9 8 7 6 5 4 3 2 1 570 575 580 585 Time (min) 590 595 600
FIGURE 5.13 Trucks on parking lot for the GA model.
5.3
Comparative Results and Analysis
Given the simulation framework and due to the stochastic behavior of the system (path outcomes), we compare the presented truck dispatching methods using Monte Carlo simulation (MOONEY, 1997). In the simulations, we used two dierent t queue (multiplying factor due to the queue size) for the TiMDP and G-TiMDP models. In the rst simulation, we used the necessary time to load (t load ) the truck type with the mean capacity (truck T2) as t queue. However, even though this time value is a good initial approximation, t queue certainly will not be the mean value of t load (unbalanced number of truck types and parallel queues with dierent servers, or shovels). As a better approach, we did some preliminary simulations and found t queue as the Average queue length/Average wait, whose values are given by the statistical information from the Queue block. An example is found in FIG. 5.16, in which only the mean time in the queues of Shovel 1 and Shovel 3 are given by FIGs 5.16a and 5.16b, respectively. The mean time of the Shovel 2 queue is not repre-

a)
4
118
Quantity of Trucks - Shovel 1 - G-TiMDP
Quantity of Trucks
0 0
100
200
300 Time (min)
400
500
600
b)
4
Quantity of Trucks
0 0
100
200
300 Time (min)
400
500
600
c)

9 8 7 6
Quantity of Trucks
5 4 3 2 1 0 0
100
200
300 Time (min)
400
500
600
FIGURE 5.14 Quantity of trucks in shovels for the G-TiMDP simulation.

Trucks on Parking Lot - G-TiMDP
15 14 13 12 11 10
119
Truck Number
9 8 7 6 5 4 3 2 1 570 575 580 585 Time (min) 590 595 600
FIGURE 5.15 Trucks on parking lot for the G-TiMDP model. sented because it is not constantly used, making the simulator incapable to indicate the variables Average queue length and Average wait (these values are always shown as zero by the simulator). The considered t queue must be taken after its convergence. Based on many observations, we adopted the values 6.2, 13 (estimated), and 2.5 minutes for t queue of shovels 1, 2, and 3, respectively. The simulations results (Table 5.1) are presented for all methods, considering around 4500 simulations and a standard representation of involved times, that is, the times are always exact. The TiMDP (1) and G-TiMDP (1) methods used the original t queue, whereas TiMDP (2) and G-TiMDP (2) methods used the estimated t queue. Considering only the averages, we can observe that the better methods are for, in descending order: G-TiMDP, TiMDP, GA, MTCT heuristic, and Greedy heuristic, as we have previewed in the previous sections. The TiMDP (2) results contradicted our predictions by being worse than the TiMDP (1) results, which can be explained by the single-dependent agent approximations. However, the G-TiMDP (2) results, which are for a m-trucks-for-n-
120
a)
Mean Time in Queue - Shovel 1 - TiMDP

6.5
5.5
Mean time (min)
4.5
3.5
3 0
100
200
300 Time (min)
400
500
600
b)
Mean Time in Queue - Shovel 3 - TiMDP

3
2.5
Mean time (min)
1.5
0.5
0 0
100
200
300 Time (min)
400
500
600
FIGURE 5.16 Mean time in the queues for TiMDP model.
121
TABLE 5.1 Monte Carlo simulations of truck dispatching methods using standard representation (standard deviation equals zero for all considered times). Method Greedy MTCT GA TiMDP (1) TiMDP (2) G-TiMDP (1) G-TiMDP (2) Sims 4722 4570 4545 4719 4561 4603 4548 Min (tons) 74 300 86 900 88 700 89 100 88 600 89 100 89 100 Max (tons) 80 600 91 400 92 000 92 600 92 600 93 100 92 800 Mean (tons) 77 427 89 292 90 467 90 936 90 923 91 201 91 236 Std Dev (tons) 870.3 629.2 482.5 534.9 534.6 539.4 528.3
shovels strategy based on solution combinations, are better than those for G-TiMDP (1). Thus, we conclude that the t queue adjustment is fundamental for achieving good results. The simulations results considering the involved times as originally proposed (Table 4.3), that is, by using Gaussian representations, are presented in Table 5.2. As the use of estimated times t queue produced good results and is a rational choice, we have used it both in the TiMDP and G-TiMDP models. Just for comparative purposes and to prove the applicability of pdfs in the TiMDP model, we did the simulations using the previous TiMDP and G-TiMDP models, and using these models with Gaussian pdfs. The overall results are worse than the results presented in the last simulation round (Table 5.1), which can be explained by the unawareness of the exact needed times to execute an action (decisions are taken based on expected values). Therefore, the knowledge of inherent imprecisions in the system by the decision model is of paramount importance, as can be shown by the better results of the TiMDP Gauss when compared to the TiMDP method. However, results for G-TiMDP Gauss were a little worse than for G-TiMDP, which can be explained by the estimated t-queue value and, mainly, by the imprecisions added by the single-dependent-agent modeling. The presented analysis until this point are valid, however they do not consider the
122
TABLE 5.2 Monte Carlo simulations of truck dispatching methods using Gaussian representation. Method Greedy MTCT GA TiMDP TiMDP Gauss G-TiMDP G-TiMDP Gauss Sims 2410 2410 2191 2410 2409 2316 2318 Min (tons) 74 300 87 100 88 400 87 900 88 800 89 300 92 600 Max (tons) 80 000 91 400 92 100 92 400 92 400 92 800 92 800 Mean (tons) 77 335 89 241 90 435 90 836 90 888 91 121 91 092 Std Dev (tons) 855.9 654.8 494.8 535.0 511.9 552.4 537.5
TABLE 5.3 Comparatives between truck dispatching methods using T-test. Comparated methods MTCT - Greedy GA - MTCT TiMDP(1) - GA TiMDP(2) - TiMDP(1) G-TiMDP(1) - TiMDP(1) G-TiMDP(2) - TiMDP(2) G-TiMDP(2) - G-TiMDP(1) t value 754.9 100.1 44.34 -1.17 23.8 28.1 3.14 Condence level >99.9% >99.9% >99.9% between 70 and 80% >99.9% >99.9% >99.8%
quantity of simulations and the standard deviation of the results, which can hardly aect the quality of results. In order to evaluate the simulation results considering the number of simulations, means, and standard deviations, we use the Students T-Test (CRAMER, 1999) as a statistically comparator of results from two dierent groups. For the comparisons between the developed dispatching methods, we consider that a method is better than other one if the signicance is greater than 0.05. Table 5.3 shows the comparatives of signicance among the methods considering a standard representation of involved time (standard deviation always equal to zero). We can observe that only the dierence between TiMDP (1) and TiMDP (2) is not signicant with a condence level of 99.9%, which is just the result that was unexpected due to the adjustment of the t queue time parameter. These results conrm the superiority of the G-TiMDP method over all other methods.
123
TABLE 5.4 Comparatives between truck dispatching methods with Gaussian representations using T-test. Comparated methods MTCT - Greedy GA - MTCT TiMDP Gauss - GA TiMDP Gauss - TiMDP G-TiMDP Gauss - TiMDP Gauss G-TiMDP Gauss - G-TiMDP t value 542.4 70.16 30.51 3.45 13.42 -1.82 Condence level >99.9% >99.9% >99.9% >99.9% >99.9% between 90 and 95%
The comparatives of the proposed truck dispatching methods are shown in Table 5.4 for the system with Gaussian representations for the involved times. The proposed G-TiMDP Gauss method is superior to all other methods, however it is worse than GTiMDP (with no consideration of Gaussian distributions). However, we note that the condence level is smaller than 95%, therefore we cannot categorically arm that GTiMDP is better than G-TiMDP with the Gauss model. In fact, in our view G-TiMDP Gauss should be selected to be used in truck dispatching environments because of its more precise uncertain time representation. Certainly, better results can be attained in environments with a complete Gauss representation of involved times, as commonly found in real-world applications.
6 Final Remarks
We present in this chapter the nal conclusions of this thesis, based on all contributions made and results achieved along the work. We also suggest future work that can be useful to improve the representation of the real-world truck dispatching problem and the proposed dispatching methods, leading to consideration of contingencies by the model and probably to higher tonnage production.
6.1
Conclusions
We presented the development of diverse truck dispatching methods to optimize the tonnage production in an example stochastic time-window mine. The developed methods were: (1) Greedy heuristic, (2) MTCT heuristic, (3) TiMDP model, (4) GA model, and (5) G-TiMDP model. The methods (1) and (2) are classical in the open-mining industry, being classied as 1-truck-to-n-shovels strategies. They suer from many problems, such as egotist behavior and determinism, being their results used as comparatives to the other developed methods. The method (3) is also classied as a 1-truck-to-n-shovels strategy, and the methods (4) and (5) are classied as m-trucks-to-n-shovels strategies, in which its combinatorial behavior may lead to better results. Our contributions point to methods (3), (4), and (5), whereas methods (1) and (2) are classical ones used in truck dispatching
CHAPTER 6. FINAL REMARKS
125
for open-pit mining, which were used in the thesis just for basement of the problem and result comparisons over a simulated example mine environment. The example mine environment was composed by time-window and uncertain variables, such as path choices by truck driver and involved times modeled as Gaussian distributions. The time-window was used to indicate the path blockage in a period of the shift, and was assumed as available information by all methods, except (1). We developed a novel application of TiMDP models to the real-time truck dispatching problem, which is a real-world problem with inherent uncertainties. The TiMDP model was solved by introducing backwards convolution, which is a solution method for discretized states. In order to minimize the curse of dimensionality (result of agents combination and state discretization), we modeled the problem using the introduced singledependent agent representation, in which agents are modeled in a concurrent single agent environment being their actions choices dependent on the current general state of the environment (which is changed by all agents actions). In our development, the dependence was modeled based on the size of the queues at shovels. Hence, the dispatching decisions were dependent on the characteristics of the truck itself and on the current state of the mine environment. Since all previously developed methods belonged to the 1-truck-to-n-shovels strategy class (egotist behavior), we introduced GA truck dispatching, which used the MTCT heuristic as tness function. This m-trucks-to-n-shovels strategy considers the following trucks in the dispatching decision, however inherent uncertainties of the environment are not considered, leading to worse results than the TiMDP model. Finally, we developed our main contribution, a novel hybrid method called G-TiMDP, which is basically the GA model using the results of the TiMDP model as tness function. Basically, this approach
CHAPTER 6. FINAL REMARKS adapted the TiMDP model to a m-trucks-to-n-shovels strategy.
126
All presented methods demonstrated to be good choices for the considered real-time problem, taking into account that dispatching decisions must be pursued quickly and the methods returned the decisions in timespans shorter than one minute. Monte Carlo simulations for the example mine were performed for all methods using the SimEventsTM environment. The results were compared using Students T-Test, in which the G-TiMDP model was ranked as the best one. The presented methods can also be used for other mine congurations, simply by adjusting the models to the new conditions. Certainly, considering a tonnage production goal, G-TiMDP will be, for any mine conguration, the best method among the presented ones.
6.2
Future work
We address some future work in order to improve the methods and to deal with common contingencies present in mine environments. Factored TiMDP representation MDPs suer the curse of dimensionality problem, in which state space explosion can lead to extremely time-consuming solutions. TiMDPs are more aected because of its discretized time representation, which is indeed a segmentation of time in states (the number of states grows up based on the discretization resolution increase). Factored MDPs (BOUTILIER; DEARDEN; GOLDSZMIDT, 2000) deal very well with large state spaces, by considering states represented in a Dynamic Bayesian Network. We propose
127
a factored TiMDP, which can be a good approach for time-dependent stochastic decision problems with large state spaces. Thus, our presented TiMDP model for truck dispatching can be represented as a multi-agent problem (as a m-trucks-to-n-shovels strategy), with a likely improvement of results. Time-dependent Reinforcement Learning Reinforcement Learning (RL) (SUTTON; BARTO, 1998) is a method for learning in uncertain environments that can be represented according to a MDP formalism. In RL, the agent learns characteristics about the environment based on its actions that can return positive or negative reinforcements (rewards or punishments in the MDP jargon). We propose the study of Time-Dependent Reinforcement Learning (TiRL), in which the reinforcements will be also related to current time and action durations. The associated theory may be applied to all time-dependent problems that can already be represented by TiMDPs. Therefore, by following a RL representation, TiRL can be based on TiMDP theory. In our presented truck dispatching problem, TiRL could be successfully applied in all involved time adjustments (such as t queue), leading to better results along the shifts. TiMDP sensibility analysis Real-world problems are subject to non-previewed alterations along the decision period. In our problem, paths can be blocked and shovels may break down or become unavailable during the shift. To consider such issues, TiMDP can be remodeled and its o-line and on-line phases executed again. However, all this rework might require a long time, which is unacceptable for real-time dispatching problems. Another solution, could be a policy selection considering that some states are unavailable, e.g. in a state, an agent can select among three dierent actions and the policy indicates the action that leads to an unavailable state; in this case, the agent must select the second action in the policy
128
list. In some cases, depending on the weight of the unavailable state to the model, the selected action by the agent can be the best one, however, the best action could be the third one, changing the quality of the nal result. Therefore, we propose an analysis on TiMDP sensibility, in which we could know in advance the maximum error in the quality of the solution introduced by the modication caused in the original TiMDP model. GA method improvement The introduced GA method used to solve the truck dispatching problem can be improved in order to provide better solutions in a shorter time. Problem representation and reproduction phase revisions certainly will improve the nal results. Based on this improvement and on the last cited future work, probably, a reviewed version of G-TiMDP will provide better results than those presented in this thesis. Consideration of production and blending goals Application of a TiMDP model to a real-time dispatching problem allowed us to solve the truck dispatching problem considering the simplied tonnage production goal. Generally, in real-world mines, the goals are based on plans, which considers daily production and blending necessities. Hence, we propose the introduction of these goals in future models using all developed stochastic methods for truck dispatching, in order to have a better representation of the problem which might improve the quality of the results. Truck dispatching based on time-dependent utilities The truck dispatching problem is composed by many other parameters that were not considered in our developed modelings and can be also considered time-dependent. Parameters like fuel consumption and tires usage are truck-displacement-dependent, however they can be correctly approximated to dependence on time. This way, we can use the in-
129
troduced time-dependent utilities applied as rewards in TiMDP models to consider other important parameters in the truck dispatching problem. For example, when a truck is asking for dispatch the system must decide whether it is better to send it to a shovel or to the fuel-station. When these decisions are taken incorrectly, they may send the trucks to a premature refueling, thus reducing the total tonnage production, or, in the worst case, causing a truck halting in the mine environment because of an empty fuel.
Bibliography
ALARIE, S.; GAMACHE, M. Overview of solution strategies used in truck dispatching systems for open pit mines. International Journal of Surface Mining, Reclamation and Environment, Taylor and Francis Ltd, v. 16, n. 1, p. 5976, 2002. BASTOS, G. S.; RIBEIRO, C. H. C.; SOUZA, L. E. de. Variable utility in multi-robot task allocation systems. Robotic Symposium, IEEE Latin American, IEEE Computer Society, p. 179183, 2008. BELLMAN, R. Dynamic programming. Science, American Association for the Advancement of Science, v. 153, n. 3731, p. 3437, 1966. BERTSEKAS, D. Dynamic programming: deterministic and stochastic models. [S.l.]: Prentice-Hall, Inc. Upper Saddle River, NJ, USA, 1987. ISBN 0132215810. BOUTILIER, C.; DEAN, T.; HANKS, S. Decision-theoretic planning: Structural assumptions and computational leverage. Journal of Articial Intelligence Research, Citeseer, v. 11, n. 1, p. 94, 1999. BOUTILIER, C.; DEARDEN, R.; GOLDSZMIDT, M. Stochastic dynamic programming with factored representations. Articial Intelligence, Elsevier, v. 121, n. 1-2, p. 49107, 2000. BOYAN, J.; LITTMAN, M. Exact solutions to timedependent mdps. Advances in Neural Information Processing Systems, v. 13, p. 17, 2000. BRAHMA, K. C. A Study on Application of Strategic Planning And Operations Research Techniques in Open Cast Mining. 2007. Tese (Doutorado) Department of Mining Engineering, National Institute of Technology, 2007. BRESINA, J.; DEARDEN, R.; MEULEAU, N.; RAMAKRISHNAN, S.; SMITH, D.; WASHINGTON, R. Planning under continuous time and resource uncertainty: A challenge for AI. In: CITESEER. AIPS Workshop on Planning for Temporal Domains. [S.l.], 2002. p. 9197. CETIN, N. Open-pit truck/shovel haulage system simulation. Tese (Doutorado) The Graduate School of Natural and Applied Sciences, Middle East Technical University, 2004. CO, C.; TANCHOCO, J. A Review of Research and AGVS Vehicle Management. [S.l.]: School of Industrial Engineering, Purdue University, 1990. CRAMER, H. Mathematical methods of statistics. [S.l.]: Princeton Univ Pr, 1999.
BIBLIOGRAPHY
131
ELBROND, J.; SOUMIS, F. Towards integrated production planning and truck dispatching in open pit mines. International Journal of Mining, Reclamation and Environment, Taylor & Francis, v. 1, n. 1, p. 16, 1987. GENDREAU, M.; POTVIN, J. Dynamic Vehicle Routing and Dispatching. Fleet management and logistics, Kluwer Academic Publishers, p. 115126, 1998. GIBSON, M.; BRUCK, J. Ecient exact stochastic simulation of chemical systems with many species and many channels. J. Phys. Chem. A, ACS Publications, v. 104, n. 9, p. 18761889, 2000. GROSS, D. Fundamentals of queueing theory. [S.l.]: Wiley-India, 2008. ISBN 8126517778. HOEY, J.; ST-AUBIN, R.; HU, A.; BOUTILIER, C. SPUDD: Stochastic planning using decision diagrams. In: CITESEER. Proceedings of the Fifteenth Conference on Uncertainty in Articial Intelligence. [S.l.], 1999. p. 279288. HORVITZ, E.; RUTLEDGE, G. Time-dependent utility and action under uncertainty. In: CITESEER. Proceedings of Seventh Conference on Uncertainty in Articial Intelligence, Los Angeles, CA. [S.l.], 1991. p. 151158. HOWARD, R. Dynamic programming and Markov process. [S.l.]: MIT press, 1960. HUANG, B.; WEI, J.; HE, M.; LU, X. The Genetic Algorithm for Truck Dispatching Problems in Surface Mine. Information Technology Journal, v. 9, n. 4, p. 710714, 2010. ICHOUA, S.; GENDREAU, M.; POTVIN, J. Vehicle dispatching with time-dependent travel times. European journal of operational research, Elsevier, v. 144, n. 2, p. 379396, 2003. JAOUA, A.; GAMACHE, M.; RIOPEL, D. Specication of an Intelligent SimulationBased Real Time Control Architecture: Application to Truck Control System. Simulation ` Evnements Discrets pour la Commande Temps Rel de a e e Syst`mes Dynamiques Complexes, p. 24, 2009. e JI, X. Models and algorithm for stochastic shortest path problem. Applied Mathematics and Computation, Elsevier, v. 170, n. 1, p. 503514, 2005. KIRKPATRICK, S. Optimization by simulated annealing: Quantitative studies. Journal of Statistical Physics, Springer, v. 34, n. 5, p. 975986, 1984. ISSN 0022-4715. KOLONJA, B.; KALASKY, D.; MUTMANSKY, J. Optimization of dispatching criteria for open-pit truck haulage system design using multiple comparisons with the best and common random numbers. In: ACM NEW YORK, NY, USA. Proceedings of the 25th conference on Winter simulation. [S.l.], 1993. p. 393401. KRAUSE, A.; MUSINGWINI, C. Modelling open pit shovel-truck systems using the Machine Repair Model. Journal of the South African Institute of Mining and Metallurgy, Marshalltown, South Africa., v. 107, n. 8, p. 469476, 2007.
BIBLIOGRAPHY LENGYEL, M.; DAYAN, P. Hippocampal contributions to control: The third way. Adv. Neural Inf. Process. Syst, Citeseer, v. 20, p. 889896, 2007.
132
LI, L.; LITTMAN, M. Lazy approximation for solving continuous nite-horizon MDPs. In: MENLO PARK, CA; CAMBRIDGE, MA; LONDON; AAAI PRESS; MIT PRESS; 1999. Proceedings of the National Conference on Articial Intelligence. [S.l.], 2005. v. 20, n. 3, p. 1175. LI, X.; SOH, L. Applications of Decision and Utility Theory in Multi-Agent Systems. CSE Technical reports, p. 56, 2004. LITTMAN, M.; DEAN, T.; KAELBLING, L. On the complexity of solving Markov decision problems. In: CITESEER. Proceedings of the Eleventh Conference on Uncertainty in Articial Intelligence. [S.l.], 1995. p. 394402. LIZOTTE, Y.; BONATES, E. Truck and shovel dispatching rules assessment using simulation. Mining Science and Technology, v. 5, p. 4558, 1987. LUDWIG, D. The distribution of population survival times. American Naturalist, JSTOR, v. 147, n. 4, p. 506526, 1996. MARECKI, J.; TOPOL, Z.; TAMBE, M. A fast analytical algorithm for MDPs with continuous state spaces. In: AAMAS-06 Proceedings of 8th Workshop on Game Theoretic and Decision Theoretic Agents. [S.l.: s.n.], 2006. MAUSAM, M.; WELD, D. Solving concurrent Markov decision processes. In: AAAI PRESS. Proceedings of the 19th national conference on Artical intelligence. [S.l.], 2004. p. 716722. MITCHELL, M. An introduction to genetic algorithms. [S.l.]: The MIT press, 1998. MOONEY, C. Monte Carlo Simulation. [S.l.]: Sage Publications, Inc, 1997. MURATA, T. Petri nets: Properties, analysis and applications. Proceedings of the IEEE, IEEE, v. 77, n. 4, p. 541580, 2002. ISSN 0018-9219. PAPADIMITRIOU, C.; STEIGLITZ, K. Combinatorial optimization: algorithms and complexity. [S.l.]: Dover Publications, 1998. PARSONS, S.; WOOLDRIDGE, M. An introduction to game theory and decision theory. Game theory and decision theory in agent-based systems, Kluwer Academic Publishers, p. 128, 2002. PELLEGRINI, J.; WAINER, J. Processos de Deciso de Markov: um tutorial. Revista a de Informtica Terica e Aplicada, v. 14, n. 2, p. 133179, 2008. a o PINTO, E. B. Despacho de caminhes em minerao usando lgica nebulosa, o ca o visando ao atendimento simultneo de pol a ticas excludentes. 2007. 120 p. Dissertaao (Masters in Production Engineering) Engineering School, Federal c University of Minas Gerais, 2007. POWELL, W. A comparative review of alternative algorithms for the dynamic vehicle allocation problem. Vehicle Routing: Methods and Studies, p. 249291, 1988.
BIBLIOGRAPHY PUTERMAN, M. Markov decision processes: discrete stochastic dynamic programming. [S.l.]: John Wiley & Sons, Inc. New York, NY, USA, 1994.
133
RACHELSON, E.; FABIANI, P.; GARCIA, F. TiMDPpoly: An improved method for solving time-dependent MDPs. In: Proceedings of the 21st IEEE International Conference on Tools with Articial Intelligence (ICTAI). [S.l.: s.n.], 2009. p. 796799. RACHELSON, E.; FABIANI, P.; GARCIA, F. TiMDPpoly: An Improved Method for Solving Time-Dependent MDPs. In: IEEE. 2009 21st IEEE International Conference on Tools with Articial Intelligence. [S.l.], 2009. p. 796799. RUSSELL, S.; NORVIG, P. Articial intelligence: a modern approach. [S.l.]: Prentice hall, 2009. SHI, Y.; EBERHART, R. Empirical study of particle swarm optimization. In: IEEE. Evolutionary Computation, 1999. CEC 99. Proceedings of the 1999 Congress on. [S.l.], 2002. v. 3. ISBN 0780355369. SOLOMON, M. Algorithms for the vehicle routing and scheduling problems with time window constraints. Operations research, JSTOR, p. 254265, 1987. SOUZA, M.; COELHO, I.; RIBAS, S.; SANTOS, H.; MERSCHMANN, L. A hybrid heuristic algorithm for the open-pit-mining operational planning problem. European Journal of Operational Research, Elsevier, v. 207, p. 10411051, 2010. ST-AUBIN, R.; HOEY, J.; BOUTILIER, C. APRICODD: Approximate policy construction using decision diagrams. Advances in Neural Information Processing Systems, Citeseer, p. 10891096, 2001. SUTTON, R.; BARTO, A. Reinforcement learning: An introduction. [S.l.]: The MIT press, 1998. SUTTON, R.; PRECUP, D.; SINGH, S. Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales. Articial Intelligence, Citeseer, v. 112, p. 181211, 2000. TA, C.; KRESTA, J.; FORBES, J.; MARQUEZ, H. A stochastic optimization approach to mine truck allocation. International Journal of Mining, Reclamation and Environment, Taylor & Francis, v. 19, n. 3, p. 162175, 2005. TEMENG, V.; OTUONYE, F.; FRENDEWEY, J. Real-time truck dispatching using a transportation algorithm. International Journal of Mining, Reclamation and Environment, Taylor & Francis, v. 11, n. 4, p. 203207, 1997. THISTED, R. Elements of statistical computing: numerical computation. [S.l.]: Chapman & Hall/CRC, 1988. TU, J.; HUCKA, V. Analysis of open-pit truck haulage system by use of a computer model. CIM Bulletin, v. 78, n. 879, p. 5359, 1985.
Appendix A - Genetic Algorithm

Genetic Algorithm (GA) (MITCHELL, 1998) is a search procedure (or heuristic) that is based on the process of natural evolution. GAs originated in 1975 from studies of cellular automata, conducted by John Holland and his students at the University of Michigan. Their applications include dierent areas such as scheduling and dispatching problems, neural nets training, image feature extraction and recognition, and other optimization and search problems. GA is part of a larger class, called evolutionary algorithms (EA), in which is also encountered the such algorithms: Ant Colony Optimization (ACO), Cultural Algorithm (CA), Particle Swarm Optimization (PSO), Memetic Algorithm (MA), Simulated Annealing (SA), and Tabu Search (TS). These algorithms generate solutions to optimization and search problems using techniques inspired by natural evolution, such as selection, crossover and mutation.
A.1
Methodology
In order to nd the solution of a search or optimization problem, GA simulates the process of natural evolution (Alg. 4). First, the algorithm generates randomly the initial population with its total size (size pop) composed by candidate solutions (individuals).
APPENDIX A. GENETIC ALGORITHM
135
Each individual is encoded by an array (chromosome), in which each value can be represented by a binary value (gene). After the generation, the population is reduced to its better individuals in selection phase evaluated by the tness function (f itness f unc). The next generations are encountered following reproduction and selection phases until the end condition (end condition) is attained. Algorithm 4: Genetic Algorithm Input: GA(size pop, f itness f unc, end condition) Output: solution t 0; Generate initial population, G(0), based on size pop; Select G(0) using f itness f unc; repeat t t + 1; Generate G(t) by reproduction using G(t 1); Select G(t 1) using f itness f unc; until end condition; return best individual(G(t))
A.1.1
Population generation
The population is generated randomly in order to cover the entire range of solutions (search space). Its size depends on the nature of the problem, being a percentage of the total of possible solutions; it contains typically hundreds or thousands of individuals. The size of the population is direct related to the quality of the solution; small populations can lead to local optimal solutions, and very large populations turns the convergence slow.
A.1.2
Selection
The selection phase is responsible to select the best individuals of a generation according to a tness function. The selected individuals are those in which its solutions
APPENDIX A. GENETIC ALGORITHM
136
ts better to the tness function. Popular selection methods include roulette wheel selection and tournament selection. In the roulette wheel the individuals has its selection probability based on its tness values, that is, individuals with larger tness values (for a maximization tness function) have higher selection probability. In the tournament the selection is made considering the most ttest individual of a pair.
A.1.3
Reproduction
After the best individuals selection, the size population is reduced to a percentage of its original size. In order, to restore the original size and, mainly, to improve the quality of individuals, the reproduction phase is executed. This phase is divided in two phases: crossover, and mutation.
A.1.3.1
Crossover
The crossover (or recombination) phase generates new individuals (child or son) from the random combination of genes of individuals of the previous generation (parents). This phase occurs until an appropriate population size is generated. In order to maintain the best individuals of the previous generation (that is, the best solutions), the crossover phase can be elitist. In this case, the parents can proceed in the next generation if its tness function is better than of their sons. Therefore, the new generated individuals that are worse than their parents are not considered in the current generation.
APPENDIX A. GENETIC ALGORITHM A.1.3.2 Mutation
137
After the crossover phase, the generated individuals can have some of their genes randomly swapped or value changed, based on a mutation ratio (generally less than 1%). The mutation phase is used to avoid the convergence to local optimal solutions; some mutated individual can lead the next generations to better solution spaces, increasing the chance to nd the optimal solution.
A.1.4
Termination
The GA ends when a termination condition is reached. Common termination conditions are:
A xed number of generations is reached; A maximum time of computation is reached; Solution convergence given an allowed maximum error; A solution is found that satises minimum criteria; or Combination of the above.
Appendix B - Statistical Distributions
B.1
Gamma Distribution
Parameters: Shape parameter () and scale parameter () specied as positive real values. Range: [0, +) Mean: Variance: 2 Applications: The gamma distribution is often used to represent the time required to complete some task (e.g., a machining time or machine repair time). The Gamma pdf is
U (, t) =
x1 ex/ ()
for x > 0 , (B.1)
0 otherwise
where is the complete gamma function given by:
APPENDIX B. STATISTICAL DISTRIBUTIONS
139
() =
0
t1 e1 dt .
(B.2)
The gamma distribution is represented by the following graphic.
FIGURE B.1 Gamma distribution.
B.2
Gaussian Distribution
Parameters: The mean () is specied as a real number and standard deviation () is specied as a positive real number. Range: (, +) Mean: Variance: 2 Applications: The Gaussian (or Normal) used empirically for many processes that appear to have a symmetric distribution. Because the theoretical range is from to
APPENDIX B. STATISTICAL DISTRIBUTIONS
140
+, the distribution should only be used for positive quantities like processing times. The normal pdf is
1 2 2 f (x) = e(x) /(2 ) , 2 for all real x. The normal distribution is represented by the following graphic.
(B.3)
FIGURE B.2 Gaussian distribution.
FOLHA DE REGISTRO DO DOCUMENTO

1.
CLASSIFICACAO/TIPO
2.
DATA
3.
DOCUMENTO No
4.
No DE PAGINAS
TD
5.
20 de dezembro de 2010
DCTA/ITA/TD - 018/2010
140
T ITULO E SUBT ITULO:
Methods for Truck Dispatching in Open-Pit Mining

6.
AUTOR(ES):

7.
INSTITUICAO(OES)/ORGAO(S) INTERNO(S)/DIVISAO(OES):
Instituto Tecnolgico de Aeronutica - ITA o a

8.
PALAVRAS-CHAVE SUGERIDAS PELO AUTOR:
Truck Dispatching; Open-pit Mining; TiMDP; Genetic Algorithm

9.
PALAVRAS-CHAVE RESULTANTES DE INDEXACAO:
Programao matemtica; Distribuio de mercadorias; Algoritmos genticos; Matemtica aplicada; Rotas; ca a ca e a Caminhes; Minerao; Matemtica o ca a 10. APRESENTACAO: (X) Nacional ( ) Internacional ITA, So Jos dos Campos. Curso de Doutorado. Programa de Ps-Graduao em Engenharia Eletrnica e a e o ca o Computao. Area de Informtica. Orientador: Carlos Henrique Costa Ribeiro; co-orientador: Luiz Edival de ca a Souza . Defesa em 09/12/2010. Publicada em 2010.
11.
RESUMO:
Material transportation is one of the most important aspects of open-pit mine operations. The problem usually involves a truck dispatching system in which decisions on truck assignments and destinations are taken in realtime. Due to its signicance, several decision systems for this problem have been developed in the last few years, improving productivity and reducing operating costs. As in many other real-world applications, the assessment and correct modeling of uncertainty is a crucial requirement as the unpredictability originated from equipment faults, weather conditions, and human mistakes, can often result in truck queues or idle shovels. However, uncertainty is not considered in most commercial dispatching systems. In this thesis, we introduce novel truck dispatching systems as a starting point to modify the current practices with a statistically principled decision making methodology. First, we present a stochastic method using Time-Dependent Markov Decision Process (TiMDP) applied to the truck dispatching problem. In the TiMDP model, travel times are represented as probabilistic density functions (pdfs), time-windows can be inserted for paths availability, and time-dependent utility can be used as a priority parameter. In order to minimize the well-known curse of dimensionality issue, to which multi-agent problems are subject when considering discrete state modelings, the system is modeled based on the introduced single-dependent-agents. Based also on the single-dependent-agents concept, we introduce the Genetic TiMDP (G-TiMDP) method applied to the truck dispatching problem. This method is a hybridization of the TiMDP model and of a Genetic Algorithm (GA), which is also used to solve the truck dispatching problem. Finally, in order to evaluate and compare the results of the introduced methods, we execute Monte Carlo simulations in a example heterogeneous mine composed by 15 trucks, 3 shovels, and 1 crusher. The uncertain aspect of the problem is represented by the path selection through crusher and shovels, which is executed by the truck driver, being independent of the dispatching system. The results are compared to classical dispatching approaches (Greedy Heuristic and Minimization of Truck Cycle Times MTCT) using Students T-test, proving the eciency of the introduced truck dispatching methods.
12.
GRAU DE SIGILO:
(X) OSTENSIVO
( ) RESERVADO
( ) CONFIDENCIAL
( ) SECRETO

Methods For Truck Dispatching in Open-Pit Mining

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Methods For Truck Dispatching in Open-Pit Mining

Hochgeladen von

Copyright:

Verfügbare Formate

Thesis presented to the Faculty of the Department of Graduate Studies of the Aeronautics Institute of Technology, in partial fulllment of the

Guilherme Sousa Bastos

METHODS FOR TRUCK DISPATCHING IN OPEN-PIT MINING

Thesis approved in its nal version by signatories below:

Prof.Dr. Carlos Henrique Costa Ribeiro Advisor

Prof.Dr Luiz Edival de Souza Co-advisor

Campo Montenegro So Jos dos Campos, SP - Brazil a e 2010

METHODS FOR TRUCK DISPATCHING IN OPEN-PIT MINING

Guilherme Sousa Bastos

Thesis Committee Composition:

Cairo Lcio Nascimento Jnior u u

Chair Person Advisor Co-advisor Member

ITA ITA UNIFEI ITA IME-USP UFOP

External Member External Member -

To Karina, by her love and patience.

List of Abbreviations and Acronyms

Time Dependence in Decision Processes . . . . . . . . . . .

Truck Dispatching in Open Pit Mines . . . . . . . . . . . . .

CONTENTS 3.1 3.2 3.2.1 3.2.2 3.2.3

Truck Dispatching Modeling . . . . . . . . . . . . . . . . . . .

4.2 4.2.1 4.2.2 4.2.3

Genetic Algorithm (GA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 G-TiMDP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

Simulations and Analysis . . . . . . . . . . . . . . . . . . . . . . 102

Final Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 Appendix A Genetic Algorithm . . . . . . . . . . . . . . . . . 134

Appendix B Statistical Distributions . . . . . . . . . . . . . 138

2 Time Dependence in Decision Processes

CHAPTER 2. TIME DEPENDENCE IN DECISION PROCESSES

CHAPTER 2. TIME DEPENDENCE IN DECISION PROCESSES

Markov Decision Processes

CHAPTER 2. TIME DEPENDENCE IN DECISION PROCESSES

V (s) = max R(s, a) + aA s S

T(s |s, a)V (s ) . (2.4)

CHAPTER 2. TIME DEPENDENCE IN DECISION PROCESSES total reward thereafter:

T(s |s, a)V (s ) .

For an optimal policy , we can dene Q (s, a):

T(s |s, a)V (s ) .

(s) = arg max Q (s, a) .

V (s) = max Q (s, a) . aA

s S T(s |s, a)Vi1 (s ) ;

CHAPTER 2. TIME DEPENDENCE IN DECISION PROCESSES

V0 (s1) = 10 V0 (s2) = 7 , V0 (s3) = 4 (2.10)

TABLE 2.2 Reward a1 10 a2 5 a3 7 a4 4

CHAPTER 2. TIME DEPENDENCE IN DECISION PROCESSES

CHAPTER 2. TIME DEPENDENCE IN DECISION PROCESSES

CHAPTER 2. TIME DEPENDENCE IN DECISION PROCESSES

Value Iteration (Gamma = 0.3)

TABLE 2.4 Q(s, a) ( = 0.9) a1 75.1 a2 72.4 a3 70.1 a4 70.1

TABLE 2.6 Q(s, a) ( = 0, 3) a1 12.7 a2 8.7 a3 9.2 a4 7.2

Time-dependent Markov Decision Processes

Stochastic state transitions; and Stochastic time-dependent action durations.

Formally, a TiMDP consists of the following components:

The TiMDP model is represented by the following Bellman equations2 :

V (s, t) = maxaA Q(s, t, a) Q(s, t, a) =

L(|s, a, t).U (, t) , (2.13)

P (t )[R(, t, t t) + V (s , t )]dt (if T = ABS)

P (t t)[R(, t, t t) + V (s , t )]dt (if T = REL)

CHAPTER 2. TIME DEPENDENCE IN DECISION PROCESSES time-windows (BRESINA et al., 2002).

Discrete solution for relative time distributions by backwards convolution

CHAPTER 2. TIME DEPENDENCE IN DECISION PROCESSES

g(t )k(t t )dt .

The discrete formulation of a convolution is,

h(j) = k(j) g(j) =