Beruflich Dokumente
Kultur Dokumente
Approach
Prof. Chelsea C. White III
chip.white@isye.gatech.edu
1
Optimality Equations
Module 2: The basics, part 2. Solving the MDP model of
inventory control. The base-stock policy.
2
Optimality Equations
• We have a model. How do we find a solution?
• Solution:
1. Optimal Policy
2. Expected cost of an optimal policy
• Several ways to proceed:
o We will use dynamic programming (D.P.).
o Many ways to solve a D.P.; we will mention a few and
focus on one.
o Assume infinite horizon: The optimality equations (O.E.):
𝑣(𝑠 𝑡) = min 𝐸 𝑐(𝑠 𝑡 , 𝑎 𝑡 , 𝑑 𝑡 + 1) + 𝛽𝑣(𝑠(𝑡 + 1) ,
𝑎 𝑡
where:
1. 𝐸 is the expectation operator.
2. 𝑣 is unknown.
3
Inventory Control O.E.
To make things easier (initially), assume:
1. 𝑐 𝑠, 𝑎, 𝑑 − 𝑐𝑎 + ℎ 𝑦 − 𝑑 + + 𝑝 𝑑 − 𝑦 + ;
Set 𝑐 = 0 w.l.o.g. (we can adjust ℎ and 𝑝 to compensate)
2. 𝑠 𝑡 + 1 = 𝑦 𝑡 − 𝑑(𝑡 + 1) (backlogging allowed)
3. No production capacity constraints
Facts:
1. There exists a unique 𝑣 ∗ such that the O.E. is satisfied.
2. 𝑣 ∗ has an intuitive interpretation: 𝑣 ∗ is the optimal
expected total discounted cost over the infinite horizon.
3. A 𝑦 that achieves the minimum is an optimal action ⇒
o construction of 𝜋 ∗
o 𝜋 ∗ is t-invariant.
Thus, the O.E. is the key to determining the solution of the
problem.
4
Exact Solutions
We still don't know 𝑣 ∗ !! How to find 𝑣 ∗ (from which 𝜋 ∗
follows):
Many approaches, including:
1. L.P.,
2. Policy iteration, and
3. Value iteration (successive approximations).
We focus on value iteration:
1. 𝑣0 given (assumed bounded)
2. 𝑣𝑛+1 (𝑠) = min σ𝑑 𝑃 𝑑 𝑐 𝑦, 𝑑 + 𝛽𝑣𝑛(𝑦 − 𝑑) . Note, given 𝑣𝑛
𝑦≥𝑠
we can find 𝑣𝑛+1 .
3. lim 𝑣𝑛 − 𝑣 ∗ = 0 (if 𝛽 < 1), where 𝑣𝑛 − 𝑣 ∗ = maxȁ𝑣 𝑠 −
𝑛→∞
5
Value Iteration
Thus,
1. choose 𝑣0
2. given 𝑣𝑛 , determine 𝑣𝑛+1
3. need stopping criterion, but eventually get sufficiently “close”
to 𝑣 ∗ .
Realities
News
1. Going from 𝑣𝑛 to 𝑣𝑛+1 requires 𝐴 𝑆 2 multiplications and
𝐴 𝑆 comparisons.
Bad News
2. Can take many iterations to converge (𝛽 dependent).
Usually Good News
3. Usually an optimal policy is determined earlier than a “close”
approximation of 𝑣 ∗ .
6
Structure and Inventory Control
Base-stock policy:
i. very intuitive
ii. easy to implement but is it optimal (or close to
optimal)?
iii. if so, does it have computational implications?
8
Existence
𝑣1 𝑠 ∗ ∀ 𝑠 ≤ 𝑠∗
𝑣1 𝑠 = 𝑃 𝑑 𝑐(𝑠, 𝑑) ∀ s > 𝑠 ∗
𝑑
Assume
𝑣𝑛 𝑠 ∗ ∀ 𝑠 ≤ 𝑠∗
𝑣𝑛 𝑠 = ቊ
convex and nondecreasing ∀ s > 𝑠 ∗
Note:
𝑣𝑛+1 𝑠 = min σ𝑑[𝑃 𝑑 𝑐 𝑦, 𝑑 + 𝛽𝑃 𝑑 𝑣𝑛 (𝑦 − 𝑑) ,
𝑦≥𝑠
9
Optimality of Base-stock Policy
Claim: Base-stock policy 𝑠 ∗ is still optimal and 𝑣𝑛+1
has the same functional form as 𝑣𝑛 .
= 𝑃 𝑑 𝑐 𝑠 ∗ , 𝑑 + 𝛽𝑣𝑛 (𝑠 ∗ − 𝑑)
𝑑
10
Optimality of Base-stock Policy
For 𝑠 ≤ 𝑠 ∗ , 𝑣𝑛+1 ≤ σ𝑑 𝑃 𝑑 𝑐 𝑠 ∗ , 𝑑 + 𝛽𝑣𝑛 (𝑠 ∗ − 𝑑).
Thus, for 𝑠 ≤ 𝑠 ∗ , order up to 𝑠 ∗ .
𝑣𝑛+1 𝑠 ∗ ∀ 𝑠 ≤ 𝑠∗
⇒ 𝑣𝑛+1 𝑠 =ቊ
convex and nondecreasing ∀ s > 𝑠 ∗
11
Implications
1. Base-stock policy: order-up-to max(𝑠 ∗ , 𝑠) is
optimal ∀ 𝑛
2. ∀ 𝑛,
𝑣𝑛+1 𝑠 ∗ ∀ 𝑠 ≤ 𝑠∗
⇒ 𝑣𝑛+1 𝑠 =ቊ
convex and nondecreasing ∀ s > 𝑠 ∗
3. 𝑠 ∗ is easy to find:
σ𝑑 𝑃 𝑑 𝑐 𝑠 ∗ , 𝑑 ≤ σ𝑑 𝑃 𝑑 𝑐 𝑦, 𝑑 , ∀ 𝑦
12