Module 2: Solution Approach: Prof. Chelsea C. White III

Module 2: Solution
Approach
Prof. Chelsea C. White III
chip.white@isye.gatech.edu
1
Optimality Equations
Module 2: The basics, part 2. Solving the MDP model of
inventory control. The base-stock policy.
• The optimality equation

• Structural results
• Functional form of the expected cost-to-go function
• Optimality of a base-stock policy
2
Optimality Equations
• We have a model. How do we find a solution?
• Solution:
1. Optimal Policy
2. Expected cost of an optimal policy
• Several ways to proceed:
o We will use dynamic programming (D.P.).
o Many ways to solve a D.P.; we will mention a few and
focus on one.
o Assume infinite horizon: The optimality equations (O.E.):
𝑣(𝑠 𝑡) = min 𝐸 𝑐(𝑠 𝑡 , 𝑎 𝑡 , 𝑑 𝑡 + 1) + 𝛽𝑣(𝑠(𝑡 + 1) ,
𝑎 𝑡
where:
1. 𝐸 is the expectation operator.
2. 𝑣 is unknown.
3
Inventory Control O.E.
To make things easier (initially), assume:
1. 𝑐 𝑠, 𝑎, 𝑑 − 𝑐𝑎 + ℎ 𝑦 − 𝑑 + + 𝑝 𝑑 − 𝑦 + ;
Set 𝑐 = 0 w.l.o.g. (we can adjust ℎ and 𝑝 to compensate)
2. 𝑠 𝑡 + 1 = 𝑦 𝑡 − 𝑑(𝑡 + 1) (backlogging allowed)
3. No production capacity constraints
Facts:
1. There exists a unique 𝑣 ∗ such that the O.E. is satisfied.
2. 𝑣 ∗ has an intuitive interpretation: 𝑣 ∗ is the optimal
expected total discounted cost over the infinite horizon.
3. A 𝑦 that achieves the minimum is an optimal action ⇒
o construction of 𝜋 ∗
o 𝜋 ∗ is t-invariant.
Thus, the O.E. is the key to determining the solution of the
problem.
4
Exact Solutions
We still don't know 𝑣 ∗ !! How to find 𝑣 ∗ (from which 𝜋 ∗
follows):
Many approaches, including:
1. L.P.,
2. Policy iteration, and
3. Value iteration (successive approximations).
We focus on value iteration:
1. 𝑣0 given (assumed bounded)
2. 𝑣𝑛+1 (𝑠) = min σ𝑑 𝑃 𝑑 𝑐 𝑦, 𝑑 + 𝛽𝑣𝑛(𝑦 − 𝑑) . Note, given 𝑣𝑛
𝑦≥𝑠
we can find 𝑣𝑛+1 .
3. lim 𝑣𝑛 − 𝑣 ∗ = 0 (if 𝛽 < 1), where 𝑣𝑛 − 𝑣 ∗ = maxȁ𝑣 𝑠 −
𝑛→∞
5
Value Iteration
Thus,
1. choose 𝑣0
2. given 𝑣𝑛 , determine 𝑣𝑛+1
3. need stopping criterion, but eventually get sufficiently “close”
to 𝑣 ∗ .
Realities
News
1. Going from 𝑣𝑛 to 𝑣𝑛+1 requires 𝐴 𝑆 2 multiplications and
𝐴 𝑆 comparisons.
Bad News
2. Can take many iterations to converge (𝛽 dependent).
Usually Good News
3. Usually an optimal policy is determined earlier than a “close”
approximation of 𝑣 ∗ .
6
Structure and Inventory Control
Base-stock policy:
i. very intuitive
ii. easy to implement but is it optimal (or close to
optimal)?
iii. if so, does it have computational implications?
Base-stock policy: There is an 𝑠 ∗ such that for 𝑠(𝑡):

i. if 𝑠 𝑡 ≤ 𝑠 ∗ , then order-up-to 𝑠 ∗
ii. if 𝑠 𝑡 > 𝑠, then do not replenish.
Comment: Finding 𝑠 ∗ (a single parameter)

determines 𝜋 ∗ completely.
7
Existence
• Is there an optimal base-stock policy?
Let 𝑣0 = 0. Then,
𝑣1 𝑠 = min σ𝑑 𝑃 𝑑 𝑐 𝑦, 𝑑 , where 𝑐 𝑦, 𝑑 = ℎ(𝑦 −
𝑦≥𝑠
8
Existence
𝑣1 𝑠 ∗ ∀ 𝑠 ≤ 𝑠∗
𝑣1 𝑠 = ෍ 𝑃 𝑑 𝑐(𝑠, 𝑑) ∀ s > 𝑠 ∗
𝑑
Assume
𝑣𝑛 𝑠 ∗ ∀ 𝑠 ≤ 𝑠∗
𝑣𝑛 𝑠 = ቊ
convex and nondecreasing ∀ s > 𝑠 ∗
Note:
𝑣𝑛+1 𝑠 = min σ𝑑[𝑃 𝑑 𝑐 𝑦, 𝑑 + 𝛽𝑃 𝑑 𝑣𝑛 (𝑦 − 𝑑) ,
𝑦≥𝑠
9
Optimality of Base-stock Policy
Claim: Base-stock policy 𝑠 ∗ is still optimal and 𝑣𝑛+1
has the same functional form as 𝑣𝑛 .
Justification (proof outline):
𝑣𝑛+1 ≥ min ෍ 𝑃 𝑑 𝑐 𝑦, 𝑑 + 𝛽 min ෍ 𝑃 𝑑 𝑣𝑛 𝑦 − 𝑑

𝑦 𝑑 𝑦 𝑑
= ෍ 𝑃 𝑑 𝑐 𝑠 ∗ , 𝑑 + 𝛽𝑣𝑛 (𝑠 ∗ − 𝑑)
𝑑
10
Optimality of Base-stock Policy
For 𝑠 ≤ 𝑠 ∗ , 𝑣𝑛+1 ≤ σ𝑑 𝑃 𝑑 𝑐 𝑠 ∗ , 𝑑 + 𝛽𝑣𝑛 (𝑠 ∗ − 𝑑).
Thus, for 𝑠 ≤ 𝑠 ∗ , order up to 𝑠 ∗ .
A straightforward argument implies: if 𝑠 > 𝑠 ∗ , then

don’t order.
⇒ policy: order up to max(𝑠 ∗ , 𝑠).
𝑣𝑛+1 𝑠 ∗ ∀ 𝑠 ≤ 𝑠∗
⇒ 𝑣𝑛+1 𝑠 =ቊ
11
Implications
1. Base-stock policy: order-up-to max(𝑠 ∗ , 𝑠) is
optimal ∀ 𝑛
2. ∀ 𝑛,
𝑣𝑛+1 𝑠 ∗ ∀ 𝑠 ≤ 𝑠∗
⇒ 𝑣𝑛+1 𝑠 =ቊ
3. 𝑠 ∗ is easy to find:
σ𝑑 𝑃 𝑑 𝑐 𝑠 ∗ , 𝑑 ≤ σ𝑑 𝑃 𝑑 𝑐 𝑦, 𝑑 , ∀ 𝑦
4. Note: optimal policy is myopic.
12

Module 2: Solution Approach: Prof. Chelsea C. White III

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Module 2: Solution Approach: Prof. Chelsea C. White III

Hochgeladen von

Copyright:

Verfügbare Formate

Module 2: Solution

• The optimality equation

Base-stock policy: There is an 𝑠 ∗ such that for 𝑠(𝑡):

Comment: Finding 𝑠 ∗ (a single parameter)

Justification (proof outline):

𝑣𝑛+1 ≥ min ෍ 𝑃 𝑑 𝑐 𝑦, 𝑑 + 𝛽 min ෍ 𝑃 𝑑 𝑣𝑛 𝑦 − 𝑑

A straightforward argument implies: if 𝑠 > 𝑠 ∗ , then

⇒ policy: order up to max(𝑠 ∗ , 𝑠).

4. Note: optimal policy is myopic.

Das könnte Ihnen auch gefallen