Sie sind auf Seite 1von 19


Management of Production Machine in a

manufacturing Industry using Makovian

Decision Process

Prepared by




Prof. B. Kareem

August, 2016



The term Markov decision process (MDP) was introduced by Bellman (1957) for the description of a

stochastic process controlled by a sequence of actions under conditions of uncertainty. MDP is

foundational bridge between stochastic optimal control on one hand and deterministic optimal control

on the other. Collections of results with some emphasis to the theoretical aspects of Markov decision

processes are given in (Derman, 1982) and (Ross, 1992). The most widely used optimization criteria in

a Markov decision process are the minimization of the finite-horizon expected cost, the minimization

of the infinite-horizon total expected discounted cost or contraction cost, and the minimization of the

long-run expected average cost per unit time.

MDP has been used in various aspects of optimization and in different areas, for example Bassey and

Chigbu (2012), used MDP approach for the optimal control of oil spill in marine environment from a

system-theoretic point of view using the state variable description of Markovian decision process and

operational research formalism. Gonzalez-Hermandez and Villareal (2009), give mild conditions for

the existence of optimal solution for a Markov decision problem with average cost and m-constraints

of the same kind in Borel actions and states spaces. They also show that there is an optimal policy that

is a convex combination of at most m+1 deterministic policies whereas in Kyriadis (2011), MDP is

used for optimal control of a simple symmetrical pest immigration-emigration process by the

introduction of total catastrophes and it was shown that, a particular control-limit policy is average

cost optimal within the class of all stationary policies by verifying that the relative values of this policy

are the solution of the corresponding optimality equation.

Optimal control is an aspect of optimization in which the input (control) parameters of a dynamical

system is manipulated so as to achieve some desired results either by minimizing cost functional or

maximizing reward functional associated with the control trajectory of the system. Notable references

on this subject on both deterministic and stochastic dynamical systems are (Kushner, 1972) and

(Kushner and Runggaldier, 1987).

In manpower control, two aspects of control are well known. These aspects of control are attainability

(reachability) and maintainability. Whereas attainability is concerned with the process of moving a

manpower system from an initial or any given structural configuration to some desired structural

configuration, maintainability is concerned with how to remain on the desired structural configuration

once it is reached, (Bartholomew et al., 1991). Various techniques have been used in optimal

manpower control. For example, in Udom and Uche (2009) time is used as an optimality performance

criterion, via the Pontryagin minimum principle, to obtain an optimal recruitment control vector for a

manpower system modeled by a stochastic differential equation and it was shown that this recruitment

vector minimizes the control time globally. Mouza, (2010) adapts a comparative simple dynamic

system (plant) with analytical presentation of stocks and flows and proceeds to the formulation of an

optimal manpower control problem aiming to achieve in the most satisfactory way, some pre-assigned

manpower targets. The work presented a method of solution of the formulated manpower control

problem based on the use of the generalized inverse. Other interesting results can be found in Lee et al.

(2001), Rao et al. (2010) and Nirmala and Jeeva (2010).


According to Machline, (2008) there is a strong concern to adjust the supply system in a company to

achieve a higher service level internally and to the outside customers. This brings to a higher

operational level and even a possible differential when compared with the other competitors (Milan et

al., 2006; Paiva et al., 2004). Materials handling management is among many factors that contribute to

improve a companys performance. The Materials Handling Industry of America [MHIA] defines

materials handling management as Material Handling is the movement, storage, control and

protection of material, goods, and products throughout the process of manufacturing, distribution,

consumption and disposal.

The focus is on the methods, mechanical equipment, systems and related controls used to achieve these

functions, (

Then it is observed that handling is broader than simple materials movement, although both terms are

sometimes used as synonyms. The relevance of materials handling stems from the intrinsic relationship

that it has with production flow. When it presents an imbalance, there is formation of extra stock or

rupture in supply. When the flow does not have enough velocity, transit time is long and the system is

not capable of serving the customers when they need it. It is well understood that material handling

improvement may have positive effects over production. However, it is not only production, but the

way the employees see the new situation. When the perception is favourable, the benefits are possible;

if not, behavioural issues can emerge. Evaluations are important when interventions into the work

environment are implemented. The present work is specifically related to materials handling

management. By means of effective materials handling management, the companys operational

performance may improve (Chopra & Meindl, 2001) and (Rosenbloom, 2003) aiming to satisfy the

customers or meet their expectations in terms of their needs, desires and demands (Oliver, 2010) and

( Stock & Lambert, 2001). Materials handling study requires that several elements are considered. The

first is a handling system project, which covers activities of sequencing, velocity, layout and routing,

(Groover, 2001). In order to complete the analysis, Groover, (2001) recommends analysing the

material itself (or object) to be transported. Therefore, it suggests the classification of Muther and

Hagan (Groover, 2001), which considers: (i) physical state (solid, liquid, gas); (ii) size (volume,

length, width, height); (iii) weight; (iv) condition (hot, cold, dry, dirty, sticky, adhesive); (v) risk of

damage (weak or strong); and (vi) safety hazards (explosive, flammable, toxic, corrosive, etc.).

Additionally, the issue of equipment and devices must be examined. Dias (1993) adopts the term

moving to describe what, in this article, is called management (handling) to adopt the terminology of

Groover (2001). When dealing with equipment, Dias (1993) presents a broad classification that covers

five categories: (i) transporters (belts, chains, rollers, etc.); (ii) cranes, hoists and lifts; (iii) industrial

vehicles (carts, tractors, pallet transporters, forklifts); (iv) positioning equipment, weighing and control

(ramps, transfer equipment); and (v) stents and support structures (pallets, holders, reels).

According to Chan et al., (1999), a key factor in material handling system design process is the

selection and configuration of equipment for material transportation.

This is directly related to this study. According to Gurgel, (1996), the equipment should be selected

based on some preliminary considerations: take into account the utilization of the factory floor and its

load capacity; examine the dimensions of doors and corridors; pay close attention to ceiling height,

identify the environmental conditions and their nature, avoid the use of combustion engines traction

equipment in storage of food products, meet all safety standards to protect humans and to eliminate the

possibility of incurring criminal and civil liabilities arising from accidents, and examine all kinds of

available energy options and their capacity to supply required movements.

The right choice of equipment and location of work-in-process is fundamental for the optimization of a

companys manufacturing capacity. Bowersox and Closs, (1996), state that a critical factor in

positioning stocks in process is a balance between convenience and consolidation to create efficiencies

when the stock flows along the value chain.

The importance of layout, which defines the placement of equipment and, consequently, restricts

possible routes and sequencing, can be perceived by the prominence that the subject is treated in

production management literature. The analysis of the relationship between layout studies and material

handling, however, does not receive much attention in the same literature. This lack of attention can be

seen in works like Gaither and Frazier, (2002), Chase et al., (2006) and Slack et al., (1997).

Finally, the systems and information technology constitute essential factors for materials handling

management. Stair and Reynolds, (2006), Laudon and Laudon, (2006) and OBrien and Marakas,

(2007) support the study of fundamentals and general principles of information systems.



General Assembly (GA) is one of the most important steps and almost the last step in production, (Li

and Meerkov, 2007), (Gershwin, 1994) and (Busey and Sadjadi, 1976). A schematic layout of GA is

shown in Fig. 1. During GA, parts are assembled through a series of assembling stations. At each

station, operators assemble certain types of parts onto the semi-product, which is then delivered to the

next station. This procedure continues until all the parts are assembled and the final product is

produced. Since each station consumes certain types of parts during the assembly, the replenishment of

these parts are provided by a material handling (MH) system, (Chang et al, 1986). The material

handling system delivers the parts from the inventory at central docking area to the lineside buffers at

assembly stations. In order to have smooth production, part delivery should be in time to prevent

starvation of the assembly line. On the other hand, in fact, how to dispatch material handling system is

extremely important to the production efficiency since 20-50% of manufacturing costs may be related

to material handling (Ardavan and Gilbert, 2005).

Fig.1: Sketched general assembly line with material handling.

The dynamic transition of this material handling system is triggered by the events including operation

completion, part delivery, etc. it can be considered as a discrete event dynamic system (DEDS). The

material handling system follows a dispatching policy to determine when to send out the driver, which

buffers to serve in each trip, and the serving sequences. The dispatching cost consists of three parts:

the starving penalty of the inside buffers, the transportation cost and the inventory cost. We want to

find the optimal dispatching policy.

The above dispatching problem is very difficult in the following senses:

Uncertainty: the major uncertainty in this material handling system is the part consumption

rates of the lineside buffers. The parts requirements are different for the products with different

options, etc. this is the typical case in mixed assembly lines where different types of products

are mixed and assembled in one line. This is the common practice in automotive industry, (Li

and Meerkov, 2007), (Gershwin, 1994) and (Busey and Sadjadi, 1976). Due to the

requirements on high accuracy and flexibility, most operations in this general assembly line are

manual, leading to an even larger variation in the consumption rate.

Policy space: A dispatching policy determines when to serve which buffers in what order. The

policy space increases exponentially with the problem size determined by the number of buffers

and the number of parts in the buffers, etc., and could be extremely large for the problems with

practical sizes. In fact, a driver would supply dozens of buffers with sizes ranging from tens to

hundreds. With little structure information of the policy space, it is in general computationally

interactable to find the optimal policy by enumeration.

Policy evaluation: Due to the uncertainties and complicated dynamics in material handling

systems, it is very difficult to obtain a closed form of expression for evaluating the performance of

a dispatching policy (Li and Meerkov, 2007) and (Buzacott and Shantikumar, 1993). Usually time

consuming Monte Carlo simulation is the only way. The above difficulties, together with the

economic impact, attract a lot of research on dispatching policies for material handling in the past

decades. Most work focuses on moving semi-products along the serial line, assuming that the

initial inventories of parts at lineside buffers are infinite (Li and Meerkov, 2007), (Chang et al,

1986) and (Ardavan and Gilbert, 2005). Various dispatching rules (MontoyaTorres, 2006) and

(Hoff and Sarker, 1998) and heuristic algorithms, (Haq et al., 2003) and (Das

and Spasovic, 2003) are well studied, based on simulation models easy to implement, (Chan et

al., 2002). However, the systems with finite inventory at the inside buffers did not get enough

attentions and stochastic and dynamic dispatching problems of real-world material handling

systems were generally not well formulated and well studied.

Markov decision process (MDP) is used to characterize sequential decision problems with

markovian properties, (Puterman, 1994) and (Cao, 2000). However, two well-known difficulties,

the large state and action spaces, prevent policy and value iterations used in traditional approaches.

There are many efforts to overcome these difficulties, such as neuro-dynamic programming,

(Bertsekas and Tsitsikis, 1996) (also known as adaptive or approximate dynamic programming

(Powell, 2007)), state aggregation, (Ren and Krogh, 2002), time aggregation (Cao et al., 2002), and

action aggregation, (Xia et al., 2008). Nevertheless, there is still no systematic formulations and

solutions for the above material handling dispatching problem.

In this paper, the material handling system of a general assembly line in a practical car

manufacturing system is considered. There are three salient features of this study:

Focus is placed on supplying parts from the central docking area to the lineside buffers with finite

inventory sizes.

A systematical markovian decision process formulation for the stochastic and dynamic dispatching

problem of the material handling system.

An event-based optimisation (EBO) approach, (Cao, 2007) and (Cao, 2005) to address the

aforementioned difficulties in traditional markovian decision process approaches.

The EBO approach is based on the structure of the material handling system. By exploiting the

problem structures, we focus on policies responding to certain events instead of all state transitions, so

that the number of aggregated potential function (value function) is scaled to the square of the system

size despite of the exponential growth of the state space (Cao, 1998) and (Cao and Chen, 1997). This

effectively reduces the computational requirements to a level that is acceptable in practice. A sample

path based algorithm is developed to estimate the potentials and implement a gradient-based policy

optimization procedure. Numerical results demonstrate that the policies obtained by the event-based

optimization approach significantly out-perform the current dispatching method in production.


This section presents a discrete markovian decision process formulation for a 2-dolly material handling

system of a general assembly line. The following assumptions were made throughout this study:

There is one driver with a 2-doy train in the material handling system that is, the driver can supply

two buffers at most in one trip.

The number of parts consumed at each buffers at each unit time has the Bernoulli distribution, and

the average consumption rate is constant, which may be different for different buffers.

To simplify the expression, it is assumed that the inventory level of buffers will be increased within

the same time unit as the replenishment action is taken.(d author).

N; i number and index of lineside buffers, i = 1, 2, , N.

T(i, j); travel time from point i to j, where i, j = 1, 2, , N denote the lineside buffers; i, j = 0 denotes

the central docking area.

Ci, inventory capacity (i.e., the size) of lineside buffer i.

Qi; supplying quantity of parts for buffer i in one dolly. That is, if the driver uses one dolly of the train

to supply buffer i, the amount of parts hold in the dolly is predetermined by the content of one

package, i.e., Qi.

Ui; average usage rate of parts at buffer i, i.e., the average amount of parts consumed at buffer i at each

time unit.

l; decision epoch of the MDP formulation, l = 0, 1, 2,.. The conveyor moves step by step in every

unit time, transferring semi-products from one station to the next. We choose the decision epoch at

each time unit, i.e., the time when the conveyor moves.

nl,I; inventory level of buffer i at time l.

Ml,I; a random variable indicating the consumption amount of parts at lineside buffer i between time l

and l+1.

yl; status of the driver at time l, i.e., how many time units left for the driver to come back to the central

docking area. For example, yl = 0 denotes that the driver is idle at time l; yl = 3 shows that the driver is

on trip at time l and will be back to central docking area at time l+3.

Y: upper bound for total travel time of all possible trips.

st; state vector of the Markov system at decision epoch l,

S;state space of the Markov system, which is defined as

at;action vector of the Markov system at epoch l,

Where al and al denote the index of the buffer to supply with the first and second dolly respectively at

epoch l, al, al = 0, 1,., N. al = 0 means that the second dolly is not used at epoch l. Obviously, al =

0 if al = 0, indicating a virtual trip supplying no buffers.

A; action space of the Markov system, defined as;

A (sl); set of all possible actions at decision epoch l with system state sl. al = al = 0, if yl > 0, since the

driver can only be sent out when he/she is available at central docking area; al, al = 0, 1,., N, if yl =


L; dispatching policy of the MH system, which is a mapping form state space S to action space A. We

only consider the stationary Markov policies here.

Cost function at epoch l with state sl and action al

c; long-run average cost under policy L


This problem can be formulated as an infinite-horizon discrete-time Markov decision process.

2.3.1 System Dynamics

Based on the system state and the dispatching actions at decision epoch l, the state transition is:

Where I(*) is an indicator function which is defined as I(*) = 1(or 0) if logic expression (*) is true (or

false); * is a ceiling function rounding a number upwards. With assumption A2, we have the

following probability functions:

From the above description of the system dynamics, we can get transition probabilities.

2.2.2 Cost Structure

The cost function at epoch l contains three aspects: the starving penalty of lineside buffers, the

transportation cost, and the inventory cost at buffers, which is calculated as:

where w1>>w2>>w3.

The priority is predetermined by our industry partners according to practical requirements.

To save the average dispatching cost of the MH system over a long period, the problem is formulated

as to minimize the long-run average cost. That is:

From the above formulation, the challenge of large problem size is obvious. The size of the state space


where |*| indicates the cardinality of the set argument. The size of the action

space is |A| = N2. With typical data from a practical system, the size of the state space is larger than

1020, and the size of the action space is larger than 10 2. Thus the size of the traditional stated-based
policy space, |A||s|, is extremely huge, which makes it difficult to optimize the dispatching policies
through traditional approaches.


Based on the problem structure, two most urgent buffers are more likely to be supplied in each trip

because there are only two dollies for the driver in the MH system. For buffer i at decision epoch l, we

define the estimated remaining life xl,i as the expected length of time it can maintain without supplying:

We sort the series of estimated remaining life ascending as xl,(1), xl,(2), , xl,(N), and focus on the first two

numbers, i.e., xl,(1), xl,(2), when making decisions. An event e0(x(1), x(2)) is defined as a set of state

transitions that the driver becomes idle and the first and second shortest estimated remaining lives turn

into x(1) and x(2), respectively, where subscript (1) and (2) denote the indices of the two most urgent

buffers. After the observation of an event e0(x(1), x(2)) at time l, there are three types of actions in event-

based policies: action (al = (1), al = (2)) dispatches the driver to supply both of the buffers (1) and (2);

action (al = (1), al = 0) only supplies buffer (1); action (al = al = 0) supplies neither of them. These

three types of actions are denoted as a+2, a+1 and a+0 hereafter.

In event-based policies, we can take an action only when one of the events happens, and the actions are

simplified. The size of an action space becomes |A| = 3, and the size of the event space is ||= 104.

Comparing with previous |S|= 1020 and |A|= 102, the size of the event based policy |A||s|, has been

greatly reduced. It is important to note that the size of event space is scaled to square of the system size

despite the exponential growth of the state space. Although the event-based policies cannot guarantee

to preserve the optimality as a trade-off, it is consistent with intuitions of the problem structure.

Numerical results demonstrate the effectiveness and efficiency of the markov decision process coupled

with event-based optimization when dealing with large scale practical problems.



Although the event-based policies cannot guarantee to preserve the optimality as a trade-off, it is

consistent with intuitions of the problem structure. Numerical results demonstrate the effectiveness and

efficiency of the markov decision process coupled with event-based optimization when dealing with

large scale practical problems.


Buzacott and J. G. Shantikumar (1993), Stochastic Models of Manufacturing Systems,

Englewood Cliffs, N.J. : Prentice Hall.

N. Haq, T. Karthikeyan and M. Dinesh (2003),Scheduling decisions in FMS using a heuristic

approach, Int J Adv Manuf Technol, vol. 22, pp. 374379.

A.-V. Ardavan, and L. Gilbert (2005),Loop based facility planning and material handling,

European Journal of Operational Research, vol. 164(1), pp. 1-11.

Bartholomew, D.J. Forbes, A.F. and McClean, S.I. (1991).Statistical Techniques for Manpower

Planning, John Wiley & Sons

Bassey, K.J. and Chigbu, P.E. (2012). On optimal control theory in marine oil spill management: A

Markovian decision approach, European Journal of Operational Research, 217: 470 478.

Bellman, R. (1957),Dynamic Programming Princeton University Press Princeton, NJ, USA.

Bertsekas D.P. and Tsitsiklis J.N. (1996),Neuro-Dynamic Programming. Belmont, MA:

Athena, Scientific.

Bowersox, D. & Closs (1996), D. Logistical management: the integrated supply chain process.

New York: McGraw-Hill.

Chan, F. T. S.; IP, R. W. L. & Lau, H.(2001), Integration of expert system with analytic hierarchy

process for the design of material handling equipment selection system. Journal of Materials

Processing Technology,(116), pp137-145.

Chase, R. Jacobs F. & Aquilano (2006), N. Administrao da produo e operaes. So Paulo:

McGraw Hill.

Chopra, S. & Meindl (2001), P. Supply chain management strategy, planning and operation.
Englewood Cliffs: Prentice Hall.

Derman, D. (1982),Finite State Markovian Decision Processes, Academic Press, New York,

Dias, M. A. Administrao de materiais: uma abordagem logstica. So Paulo: Atlas, 1993.

E. B. Hoff, and B. R. Sarker, An overview of path design and dispatching methods for automated
guided vehicles, Integrated Manufacturing Systems, vol. 9(5), pp 296307, 1998.

F. T. S. Chan, H. K. Chan and H. C. W. Lau, The State of the Art in Simulation Study on FMS

Scheduling: A Comprehensive Survey, Int J Adv Manuf Technol, vol. 19, pp. 830849, 2002.

G. M. Buxey and D. Sadjadi, Simulation studies of conveyor-paced assembly lines with buffer

capacity, International Journal of

Gaither, N. & Frazier, G. Administrao da produo e operaes. So Paulo: Thompson, 2002.

Gonzalez-Hermandez, J and Villareal, C.E. (2009). Optimal policies for constrained averagecost

Markov decision processes, Top, doi 10.1007/s11750-009-0110-7.

Groover, M. P. Automation, Production Systems, and Computer-Integrated Manufacturing, 2 nd ed.

New Jersey: Prentice-Hall, 2001.

Laudon, K. C. & Laudon, J. P. Management information systems. 6th edition. Hardcover: Prentice

Hall, 2006.

Lee, H.W.J., Cai, X.Q. and Teo, K.L. (2001). An optimal control approach to manpower planning

problems, Mathematical Problems in Engineering 7:155 175.

M. L. Puterman, Markov Decision Process: Discrete Stochastic Dynamic Programming. John

Wiley & Sons, Inc., New York, 1994.

Machline, C. A new kind of operations inventory: the pre-assembled kit. Journal of Operations and

Supply Chain Management, 1(1), 2008, p. 24-28.

Milan, G. S., Preto, M. R. & Basso, L. C. Um estudo de caso sobre o funcionamento de um

armazm automatizado.

Mouza, A.M. (2010). Application of optimal control in manpower planning, Quality and

Quantity, 44 (2), 199 215.

Nirmala, S. and Jeeva, M. (2010). A dynamic programming approach to optimal manpower

recruitment and promotion policies for the two grade system, African Journal of Mathematics and

Computer science 3: (12), 297 301.

OBrien, J. A. & Marakas, G. M. Administrao de sistemas de informao. So Paulo:

McGraw Hill, 2007.

Oliver, R. L. Satisfaction: a behavioral perspective on the consumer. 2 nd edition. New York:

Irwin/McGraw-Hill, 2010.

Ossai, E.O. and Uche, P.I. (2009). Maintainability of Departmentalized Manpower Structures in

Markov Chain Model Pacific Journal of Science and Technology 10: (2), 295 302.

Paiva, E. L., Carvalho Jr., J. M. & Fensterseifer, J. E. Estratgia de produo e de operaes:

conceitos, melhores prticas, viso de futuro. Porto Alegre: Bookman, 2004.

Production Research, vol. 14(5), pp. 607-624, 1976,.

Rao, P.T., Reddy, P.R.S. and Suhasini, A.V.S. (2010). Optimal manpower recruitment by stochastic

programming in graded manpower systems, International Journal of Systems

Assurance Engineering and Management 1: (2), 179 185.

REAd Revista Eletrnica de Administrao, 12(7), 2007, p. 1-28.

Rosenbloom, B. Marketing channels: a management view. 7th edition. New York: Cengage

Learning, 2003.

Ross, S.M., (1992). Applied Probability Models with Optimization Applications, Dover, New

York, USA.

S. B. Gershwin, Manufacturing systems engineering. Englewood Cliffs, N.J. : PTR Prentice

Hall, 1994.

S. K. Das and L. Spasovic, Scheduling material handling vehicles in a container terminal,

Production Planning & Control, vol. 14(7), pp. 623633, 2003.

Slack, N., Chambers, S., Harland, C., Harrison, A. & Johnston, R. Administrao da produo. So

Paulo: Atlas, 1996.