Sie sind auf Seite 1von 12

Module 2: Solution

Approach
Prof. Chelsea C. White III
chip.white@isye.gatech.edu

1
Optimality Equations
Module 2: The basics, part 2. Solving the MDP model of
inventory control. The base-stock policy.

• The optimality equation


• Structural results
• Functional form of the expected cost-to-go function
• Optimality of a base-stock policy

2
Optimality Equations
• We have a model. How do we find a solution?
• Solution:
1. Optimal Policy
2. Expected cost of an optimal policy
• Several ways to proceed:
o We will use dynamic programming (D.P.).
o Many ways to solve a D.P.; we will mention a few and
focus on one.
o Assume infinite horizon: The optimality equations (O.E.):
𝑣(𝑠 𝑡) = min 𝐸 𝑐(𝑠 𝑡 , 𝑎 𝑡 , 𝑑 𝑡 + 1) + 𝛽𝑣(𝑠(𝑡 + 1) ,
𝑎 𝑡

where:
1. 𝐸 is the expectation operator.
2. 𝑣 is unknown.
3
Inventory Control O.E.
To make things easier (initially), assume:
1. 𝑐 𝑠, 𝑎, 𝑑 − 𝑐𝑎 + ℎ 𝑦 − 𝑑 + + 𝑝 𝑑 − 𝑦 + ;
Set 𝑐 = 0 w.l.o.g. (we can adjust ℎ and 𝑝 to compensate)
2. 𝑠 𝑡 + 1 = 𝑦 𝑡 − 𝑑(𝑡 + 1) (backlogging allowed)
3. No production capacity constraints
Facts:
1. There exists a unique 𝑣 ∗ such that the O.E. is satisfied.
2. 𝑣 ∗ has an intuitive interpretation: 𝑣 ∗ is the optimal
expected total discounted cost over the infinite horizon.
3. A 𝑦 that achieves the minimum is an optimal action ⇒
o construction of 𝜋 ∗
o 𝜋 ∗ is t-invariant.
Thus, the O.E. is the key to determining the solution of the
problem.
4
Exact Solutions
We still don't know 𝑣 ∗ !! How to find 𝑣 ∗ (from which 𝜋 ∗
follows):
Many approaches, including:
1. L.P.,
2. Policy iteration, and
3. Value iteration (successive approximations).
We focus on value iteration:
1. 𝑣0 given (assumed bounded)
2. 𝑣𝑛+1 (𝑠) = min σ𝑑 𝑃 𝑑 𝑐 𝑦, 𝑑 + 𝛽𝑣𝑛(𝑦 − 𝑑) . Note, given 𝑣𝑛
𝑦≥𝑠
we can find 𝑣𝑛+1 .
3. lim 𝑣𝑛 − 𝑣 ∗ = 0 (if 𝛽 < 1), where 𝑣𝑛 − 𝑣 ∗ = maxȁ𝑣 𝑠 −
𝑛→∞

5
Value Iteration
Thus,
1. choose 𝑣0
2. given 𝑣𝑛 , determine 𝑣𝑛+1
3. need stopping criterion, but eventually get sufficiently “close”
to 𝑣 ∗ .
Realities
News
1. Going from 𝑣𝑛 to 𝑣𝑛+1 requires 𝐴 𝑆 2 multiplications and
𝐴 𝑆 comparisons.
Bad News
2. Can take many iterations to converge (𝛽 dependent).
Usually Good News
3. Usually an optimal policy is determined earlier than a “close”
approximation of 𝑣 ∗ .

6
Structure and Inventory Control
Base-stock policy:
i. very intuitive
ii. easy to implement but is it optimal (or close to
optimal)?
iii. if so, does it have computational implications?

Base-stock policy: There is an 𝑠 ∗ such that for 𝑠(𝑡):


i. if 𝑠 𝑡 ≤ 𝑠 ∗ , then order-up-to 𝑠 ∗
ii. if 𝑠 𝑡 > 𝑠, then do not replenish.

Comment: Finding 𝑠 ∗ (a single parameter)


determines 𝜋 ∗ completely.
7
Existence
• Is there an optimal base-stock policy?
Let 𝑣0 = 0. Then,
𝑣1 𝑠 = min σ𝑑 𝑃 𝑑 𝑐 𝑦, 𝑑 , where 𝑐 𝑦, 𝑑 = ℎ(𝑦 −
𝑦≥𝑠

8
Existence
𝑣1 𝑠 ∗ ∀ 𝑠 ≤ 𝑠∗
𝑣1 𝑠 = ෍ 𝑃 𝑑 𝑐(𝑠, 𝑑) ∀ s > 𝑠 ∗
𝑑
Assume
𝑣𝑛 𝑠 ∗ ∀ 𝑠 ≤ 𝑠∗
𝑣𝑛 𝑠 = ቊ
convex and nondecreasing ∀ s > 𝑠 ∗

Note:
𝑣𝑛+1 𝑠 = min σ𝑑[𝑃 𝑑 𝑐 𝑦, 𝑑 + 𝛽𝑃 𝑑 𝑣𝑛 (𝑦 − 𝑑) ,
𝑦≥𝑠

9
Optimality of Base-stock Policy
Claim: Base-stock policy 𝑠 ∗ is still optimal and 𝑣𝑛+1
has the same functional form as 𝑣𝑛 .

Justification (proof outline):

𝑣𝑛+1 ≥ min ෍ 𝑃 𝑑 𝑐 𝑦, 𝑑 + 𝛽 min ෍ 𝑃 𝑑 𝑣𝑛 𝑦 − 𝑑


𝑦 𝑑 𝑦 𝑑

= ෍ 𝑃 𝑑 𝑐 𝑠 ∗ , 𝑑 + 𝛽𝑣𝑛 (𝑠 ∗ − 𝑑)
𝑑

10
Optimality of Base-stock Policy
For 𝑠 ≤ 𝑠 ∗ , 𝑣𝑛+1 ≤ σ𝑑 𝑃 𝑑 𝑐 𝑠 ∗ , 𝑑 + 𝛽𝑣𝑛 (𝑠 ∗ − 𝑑).
Thus, for 𝑠 ≤ 𝑠 ∗ , order up to 𝑠 ∗ .

A straightforward argument implies: if 𝑠 > 𝑠 ∗ , then


don’t order.

⇒ policy: order up to max(𝑠 ∗ , 𝑠).

𝑣𝑛+1 𝑠 ∗ ∀ 𝑠 ≤ 𝑠∗
⇒ 𝑣𝑛+1 𝑠 =ቊ
convex and nondecreasing ∀ s > 𝑠 ∗

11
Implications
1. Base-stock policy: order-up-to max(𝑠 ∗ , 𝑠) is
optimal ∀ 𝑛

2. ∀ 𝑛,
𝑣𝑛+1 𝑠 ∗ ∀ 𝑠 ≤ 𝑠∗
⇒ 𝑣𝑛+1 𝑠 =ቊ
convex and nondecreasing ∀ s > 𝑠 ∗

3. 𝑠 ∗ is easy to find:
σ𝑑 𝑃 𝑑 𝑐 𝑠 ∗ , 𝑑 ≤ σ𝑑 𝑃 𝑑 𝑐 𝑦, 𝑑 , ∀ 𝑦

4. Note: optimal policy is myopic.

12

Das könnte Ihnen auch gefallen