Beruflich Dokumente
Kultur Dokumente
Decision Process
Prepared by
MEE/08/3474
SUBMITTED TO
Prof. B. Kareem
August, 2016
1
CHAPTER ONE
1.0 INTRODUCTION
The term Markov decision process (MDP) was introduced by Bellman (1957) for the description of a
foundational bridge between stochastic optimal control on one hand and deterministic optimal control
on the other. Collections of results with some emphasis to the theoretical aspects of Markov decision
processes are given in (Derman, 1982) and (Ross, 1992). The most widely used optimization criteria in
a Markov decision process are the minimization of the finite-horizon expected cost, the minimization
of the infinite-horizon total expected discounted cost or contraction cost, and the minimization of the
MDP has been used in various aspects of optimization and in different areas, for example Bassey and
Chigbu (2012), used MDP approach for the optimal control of oil spill in marine environment from a
system-theoretic point of view using the state variable description of Markovian decision process and
operational research formalism. Gonzalez-Hermandez and Villareal (2009), give mild conditions for
the existence of optimal solution for a Markov decision problem with average cost and m-constraints
of the same kind in Borel actions and states spaces. They also show that there is an optimal policy that
is a convex combination of at most m+1 deterministic policies whereas in Kyriadis (2011), MDP is
used for optimal control of a simple symmetrical pest immigration-emigration process by the
introduction of total catastrophes and it was shown that, a particular control-limit policy is average
cost optimal within the class of all stationary policies by verifying that the relative values of this policy
Optimal control is an aspect of optimization in which the input (control) parameters of a dynamical
system is manipulated so as to achieve some desired results either by minimizing cost functional or
2
maximizing reward functional associated with the control trajectory of the system. Notable references
on this subject on both deterministic and stochastic dynamical systems are (Kushner, 1972) and
In manpower control, two aspects of control are well known. These aspects of control are attainability
(reachability) and maintainability. Whereas attainability is concerned with the process of moving a
manpower system from an initial or any given structural configuration to some desired structural
configuration, maintainability is concerned with how to remain on the desired structural configuration
once it is reached, (Bartholomew et al., 1991). Various techniques have been used in optimal
manpower control. For example, in Udom and Uche (2009) time is used as an optimality performance
criterion, via the Pontryagin minimum principle, to obtain an optimal recruitment control vector for a
manpower system modeled by a stochastic differential equation and it was shown that this recruitment
vector minimizes the control time globally. Mouza, (2010) adapts a comparative simple dynamic
system (plant) with analytical presentation of stocks and flows and proceeds to the formulation of an
optimal manpower control problem aiming to achieve in the most satisfactory way, some pre-assigned
manpower targets. The work presented a method of solution of the formulated manpower control
problem based on the use of the generalized inverse. Other interesting results can be found in Lee et al.
According to Machline, (2008) there is a strong concern to adjust the supply system in a company to
achieve a higher service level internally and to the outside customers. This brings to a higher
operational level and even a possible differential when compared with the other competitors (Milan et
al., 2006; Paiva et al., 2004). Materials handling management is among many factors that contribute to
improve a companys performance. The Materials Handling Industry of America [MHIA] defines
3
materials handling management as Material Handling is the movement, storage, control and
protection of material, goods, and products throughout the process of manufacturing, distribution,
The focus is on the methods, mechanical equipment, systems and related controls used to achieve these
functions, (mhia.org/learning/glossary).
Then it is observed that handling is broader than simple materials movement, although both terms are
sometimes used as synonyms. The relevance of materials handling stems from the intrinsic relationship
that it has with production flow. When it presents an imbalance, there is formation of extra stock or
rupture in supply. When the flow does not have enough velocity, transit time is long and the system is
not capable of serving the customers when they need it. It is well understood that material handling
improvement may have positive effects over production. However, it is not only production, but the
way the employees see the new situation. When the perception is favourable, the benefits are possible;
if not, behavioural issues can emerge. Evaluations are important when interventions into the work
environment are implemented. The present work is specifically related to materials handling
performance may improve (Chopra & Meindl, 2001) and (Rosenbloom, 2003) aiming to satisfy the
customers or meet their expectations in terms of their needs, desires and demands (Oliver, 2010) and
( Stock & Lambert, 2001). Materials handling study requires that several elements are considered. The
first is a handling system project, which covers activities of sequencing, velocity, layout and routing,
(Groover, 2001). In order to complete the analysis, Groover, (2001) recommends analysing the
material itself (or object) to be transported. Therefore, it suggests the classification of Muther and
Hagan (Groover, 2001), which considers: (i) physical state (solid, liquid, gas); (ii) size (volume,
length, width, height); (iii) weight; (iv) condition (hot, cold, dry, dirty, sticky, adhesive); (v) risk of
damage (weak or strong); and (vi) safety hazards (explosive, flammable, toxic, corrosive, etc.).
4
Additionally, the issue of equipment and devices must be examined. Dias (1993) adopts the term
moving to describe what, in this article, is called management (handling) to adopt the terminology of
Groover (2001). When dealing with equipment, Dias (1993) presents a broad classification that covers
five categories: (i) transporters (belts, chains, rollers, etc.); (ii) cranes, hoists and lifts; (iii) industrial
vehicles (carts, tractors, pallet transporters, forklifts); (iv) positioning equipment, weighing and control
(ramps, transfer equipment); and (v) stents and support structures (pallets, holders, reels).
According to Chan et al., (1999), a key factor in material handling system design process is the
This is directly related to this study. According to Gurgel, (1996), the equipment should be selected
based on some preliminary considerations: take into account the utilization of the factory floor and its
load capacity; examine the dimensions of doors and corridors; pay close attention to ceiling height,
identify the environmental conditions and their nature, avoid the use of combustion engines traction
equipment in storage of food products, meet all safety standards to protect humans and to eliminate the
possibility of incurring criminal and civil liabilities arising from accidents, and examine all kinds of
The right choice of equipment and location of work-in-process is fundamental for the optimization of a
companys manufacturing capacity. Bowersox and Closs, (1996), state that a critical factor in
positioning stocks in process is a balance between convenience and consolidation to create efficiencies
The importance of layout, which defines the placement of equipment and, consequently, restricts
possible routes and sequencing, can be perceived by the prominence that the subject is treated in
production management literature. The analysis of the relationship between layout studies and material
5
handling, however, does not receive much attention in the same literature. This lack of attention can be
seen in works like Gaither and Frazier, (2002), Chase et al., (2006) and Slack et al., (1997).
Finally, the systems and information technology constitute essential factors for materials handling
management. Stair and Reynolds, (2006), Laudon and Laudon, (2006) and OBrien and Marakas,
(2007) support the study of fundamentals and general principles of information systems.
6
CHAPTER TWO
General Assembly (GA) is one of the most important steps and almost the last step in production, (Li
and Meerkov, 2007), (Gershwin, 1994) and (Busey and Sadjadi, 1976). A schematic layout of GA is
shown in Fig. 1. During GA, parts are assembled through a series of assembling stations. At each
station, operators assemble certain types of parts onto the semi-product, which is then delivered to the
next station. This procedure continues until all the parts are assembled and the final product is
produced. Since each station consumes certain types of parts during the assembly, the replenishment of
these parts are provided by a material handling (MH) system, (Chang et al, 1986). The material
handling system delivers the parts from the inventory at central docking area to the lineside buffers at
assembly stations. In order to have smooth production, part delivery should be in time to prevent
starvation of the assembly line. On the other hand, in fact, how to dispatch material handling system is
extremely important to the production efficiency since 20-50% of manufacturing costs may be related
7
Fig.1: Sketched general assembly line with material handling.
The dynamic transition of this material handling system is triggered by the events including operation
completion, part delivery, etc. it can be considered as a discrete event dynamic system (DEDS). The
material handling system follows a dispatching policy to determine when to send out the driver, which
buffers to serve in each trip, and the serving sequences. The dispatching cost consists of three parts:
the starving penalty of the inside buffers, the transportation cost and the inventory cost. We want to
Uncertainty: the major uncertainty in this material handling system is the part consumption
rates of the lineside buffers. The parts requirements are different for the products with different
options, etc. this is the typical case in mixed assembly lines where different types of products
are mixed and assembled in one line. This is the common practice in automotive industry, (Li
and Meerkov, 2007), (Gershwin, 1994) and (Busey and Sadjadi, 1976). Due to the
requirements on high accuracy and flexibility, most operations in this general assembly line are
Policy space: A dispatching policy determines when to serve which buffers in what order. The
policy space increases exponentially with the problem size determined by the number of buffers
and the number of parts in the buffers, etc., and could be extremely large for the problems with
practical sizes. In fact, a driver would supply dozens of buffers with sizes ranging from tens to
hundreds. With little structure information of the policy space, it is in general computationally
Policy evaluation: Due to the uncertainties and complicated dynamics in material handling
systems, it is very difficult to obtain a closed form of expression for evaluating the performance of
8
a dispatching policy (Li and Meerkov, 2007) and (Buzacott and Shantikumar, 1993). Usually time
consuming Monte Carlo simulation is the only way. The above difficulties, together with the
economic impact, attract a lot of research on dispatching policies for material handling in the past
decades. Most work focuses on moving semi-products along the serial line, assuming that the
initial inventories of parts at lineside buffers are infinite (Li and Meerkov, 2007), (Chang et al,
1986) and (Ardavan and Gilbert, 2005). Various dispatching rules (MontoyaTorres, 2006) and
(Hoff and Sarker, 1998) and heuristic algorithms, (Haq et al., 2003) and (Das
and Spasovic, 2003) are well studied, based on simulation models easy to implement, (Chan et
al., 2002). However, the systems with finite inventory at the inside buffers did not get enough
attentions and stochastic and dynamic dispatching problems of real-world material handling
Markov decision process (MDP) is used to characterize sequential decision problems with
markovian properties, (Puterman, 1994) and (Cao, 2000). However, two well-known difficulties,
the large state and action spaces, prevent policy and value iterations used in traditional approaches.
There are many efforts to overcome these difficulties, such as neuro-dynamic programming,
(Bertsekas and Tsitsikis, 1996) (also known as adaptive or approximate dynamic programming
(Powell, 2007)), state aggregation, (Ren and Krogh, 2002), time aggregation (Cao et al., 2002), and
action aggregation, (Xia et al., 2008). Nevertheless, there is still no systematic formulations and
In this paper, the material handling system of a general assembly line in a practical car
manufacturing system is considered. There are three salient features of this study:
9
Focus is placed on supplying parts from the central docking area to the lineside buffers with finite
inventory sizes.
A systematical markovian decision process formulation for the stochastic and dynamic dispatching
An event-based optimisation (EBO) approach, (Cao, 2007) and (Cao, 2005) to address the
The EBO approach is based on the structure of the material handling system. By exploiting the
problem structures, we focus on policies responding to certain events instead of all state transitions, so
that the number of aggregated potential function (value function) is scaled to the square of the system
size despite of the exponential growth of the state space (Cao, 1998) and (Cao and Chen, 1997). This
effectively reduces the computational requirements to a level that is acceptable in practice. A sample
path based algorithm is developed to estimate the potentials and implement a gradient-based policy
optimization procedure. Numerical results demonstrate that the policies obtained by the event-based
This section presents a discrete markovian decision process formulation for a 2-dolly material handling
system of a general assembly line. The following assumptions were made throughout this study:
There is one driver with a 2-doy train in the material handling system that is, the driver can supply
The number of parts consumed at each buffers at each unit time has the Bernoulli distribution, and
the average consumption rate is constant, which may be different for different buffers.
10
To simplify the expression, it is assumed that the inventory level of buffers will be increased within
T(i, j); travel time from point i to j, where i, j = 1, 2, , N denote the lineside buffers; i, j = 0 denotes
Qi; supplying quantity of parts for buffer i in one dolly. That is, if the driver uses one dolly of the train
to supply buffer i, the amount of parts hold in the dolly is predetermined by the content of one
Ui; average usage rate of parts at buffer i, i.e., the average amount of parts consumed at buffer i at each
time unit.
l; decision epoch of the MDP formulation, l = 0, 1, 2,.. The conveyor moves step by step in every
unit time, transferring semi-products from one station to the next. We choose the decision epoch at
each time unit, i.e., the time when the conveyor moves.
Ml,I; a random variable indicating the consumption amount of parts at lineside buffer i between time l
and l+1.
yl; status of the driver at time l, i.e., how many time units left for the driver to come back to the central
docking area. For example, yl = 0 denotes that the driver is idle at time l; yl = 3 shows that the driver is
on trip at time l and will be back to central docking area at time l+3.
11
st; state vector of the Markov system at decision epoch l,
Where al and al denote the index of the buffer to supply with the first and second dolly respectively at
epoch l, al, al = 0, 1,., N. al = 0 means that the second dolly is not used at epoch l. Obviously, al =
A (sl); set of all possible actions at decision epoch l with system state sl. al = al = 0, if yl > 0, since the
driver can only be sent out when he/she is available at central docking area; al, al = 0, 1,., N, if yl =
0.
L; dispatching policy of the MH system, which is a mapping form state space S to action space A. We
Based on the system state and the dispatching actions at decision epoch l, the state transition is:
12
Where I(*) is an indicator function which is defined as I(*) = 1(or 0) if logic expression (*) is true (or
false); * is a ceiling function rounding a number upwards. With assumption A2, we have the
From the above description of the system dynamics, we can get transition probabilities.
The cost function at epoch l contains three aspects: the starving penalty of lineside buffers, the
transportation cost, and the inventory cost at buffers, which is calculated as:
where w1>>w2>>w3.
To save the average dispatching cost of the MH system over a long period, the problem is formulated
From the above formulation, the challenge of large problem size is obvious. The size of the state space
is;
where |*| indicates the cardinality of the set argument. The size of the action
space is |A| = N2. With typical data from a practical system, the size of the state space is larger than
1020, and the size of the action space is larger than 10 2. Thus the size of the traditional stated-based
policy space, |A||s|, is extremely huge, which makes it difficult to optimize the dispatching policies
through traditional approaches.
13
2.3 EVENT-BASED OPTIMIZATION
Based on the problem structure, two most urgent buffers are more likely to be supplied in each trip
because there are only two dollies for the driver in the MH system. For buffer i at decision epoch l, we
define the estimated remaining life xl,i as the expected length of time it can maintain without supplying:
We sort the series of estimated remaining life ascending as xl,(1), xl,(2), , xl,(N), and focus on the first two
numbers, i.e., xl,(1), xl,(2), when making decisions. An event e0(x(1), x(2)) is defined as a set of state
transitions that the driver becomes idle and the first and second shortest estimated remaining lives turn
into x(1) and x(2), respectively, where subscript (1) and (2) denote the indices of the two most urgent
buffers. After the observation of an event e0(x(1), x(2)) at time l, there are three types of actions in event-
based policies: action (al = (1), al = (2)) dispatches the driver to supply both of the buffers (1) and (2);
action (al = (1), al = 0) only supplies buffer (1); action (al = al = 0) supplies neither of them. These
three types of actions are denoted as a+2, a+1 and a+0 hereafter.
In event-based policies, we can take an action only when one of the events happens, and the actions are
simplified. The size of an action space becomes |A| = 3, and the size of the event space is ||= 104.
Comparing with previous |S|= 1020 and |A|= 102, the size of the event based policy |A||s|, has been
greatly reduced. It is important to note that the size of event space is scaled to square of the system size
despite the exponential growth of the state space. Although the event-based policies cannot guarantee
to preserve the optimality as a trade-off, it is consistent with intuitions of the problem structure.
Numerical results demonstrate the effectiveness and efficiency of the markov decision process coupled
with event-based optimization when dealing with large scale practical problems.
14
CHAPTER THREE
3.0 CONCLUSION
Although the event-based policies cannot guarantee to preserve the optimality as a trade-off, it is
consistent with intuitions of the problem structure. Numerical results demonstrate the effectiveness and
efficiency of the markov decision process coupled with event-based optimization when dealing with
15
REFERENCES
A.-V. Ardavan, and L. Gilbert (2005),Loop based facility planning and material handling,
Bartholomew, D.J. Forbes, A.F. and McClean, S.I. (1991).Statistical Techniques for Manpower
Bassey, K.J. and Chigbu, P.E. (2012). On optimal control theory in marine oil spill management: A
Markovian decision approach, European Journal of Operational Research, 217: 470 478.
Athena, Scientific.
Bowersox, D. & Closs (1996), D. Logistical management: the integrated supply chain process.
Chan, F. T. S.; IP, R. W. L. & Lau, H.(2001), Integration of expert system with analytic hierarchy
process for the design of material handling equipment selection system. Journal of Materials
16
Chopra, S. & Meindl (2001), P. Supply chain management strategy, planning and operation.
Englewood Cliffs: Prentice Hall.
Derman, D. (1982),Finite State Markovian Decision Processes, Academic Press, New York,
USA.
E. B. Hoff, and B. R. Sarker, An overview of path design and dispatching methods for automated
guided vehicles, Integrated Manufacturing Systems, vol. 9(5), pp 296307, 1998.
F. T. S. Chan, H. K. Chan and H. C. W. Lau, The State of the Art in Simulation Study on FMS
Scheduling: A Comprehensive Survey, Int J Adv Manuf Technol, vol. 19, pp. 830849, 2002.
G. M. Buxey and D. Sadjadi, Simulation studies of conveyor-paced assembly lines with buffer
Gonzalez-Hermandez, J and Villareal, C.E. (2009). Optimal policies for constrained averagecost
Laudon, K. C. & Laudon, J. P. Management information systems. 6th edition. Hardcover: Prentice
Hall, 2006.
Lee, H.W.J., Cai, X.Q. and Teo, K.L. (2001). An optimal control approach to manpower planning
17
Machline, C. A new kind of operations inventory: the pre-assembled kit. Journal of Operations and
armazm automatizado.
Mouza, A.M. (2010). Application of optimal control in manpower planning, Quality and
recruitment and promotion policies for the two grade system, African Journal of Mathematics and
Irwin/McGraw-Hill, 2010.
Ossai, E.O. and Uche, P.I. (2009). Maintainability of Departmentalized Manpower Structures in
Markov Chain Model Pacific Journal of Science and Technology 10: (2), 295 302.
Rao, P.T., Reddy, P.R.S. and Suhasini, A.V.S. (2010). Optimal manpower recruitment by stochastic
18
REAd Revista Eletrnica de Administrao, 12(7), 2007, p. 1-28.
Rosenbloom, B. Marketing channels: a management view. 7th edition. New York: Cengage
Learning, 2003.
Ross, S.M., (1992). Applied Probability Models with Optimization Applications, Dover, New
York, USA.
Hall, 1994.
Slack, N., Chambers, S., Harland, C., Harrison, A. & Johnston, R. Administrao da produo. So
19