Beruflich Dokumente
Kultur Dokumente
Outline
Dynamic Treatment Regimes
Optimal Q-functions and Q-learning The Problem & Goal Finite Sample Bounds Outline of Proof Shortcomings and Open Problems
---- Used in the management of Addictions, Mental Illnesses, HIV infection and Cancer
k Decisions
Observations made prior to tth decision Action at tth decision
Primary Outcome:
Definition:
denotes expectation when the actions are chosen according to the regime
Q-functions:
The Q-functions for optimal regime, recursively by are given
For t=k,k-1,.
Q-functions:
The optimal regime is given by
Q-learning:
Given a model for the Q-functions, minimize
over
Set
Q-learning:
For each t=k-1,,1 minimize
over
And set
and so on.
Q-Learning:
The estimated regime is given by
implicitly constrains the class of possible decision rules in the dynamic treatment regime: call this constrained class,
is a vector with many components (high dimensional) thus the model is likely incorrect; view and as approximation classes.
where
denotes expectation when the actions are chosen according to the rule
This upper bound should be composed of quantities that are minimized in the learning method.
(1)
for L>1.
Definition:
where E, without a subscript, denotes expectation when the actions are randomized.
Results:
Approximation Error:
with
Define
Estimation Error:
For with probability at least 1-
for n satisfying
If
that is,
Outline of Proof:
The Q-functions for regime are given by
Proof Outline
(1)
Proof Outline
(2)
Proof Outline
(3)
for n satisfying
Open Problems
Is there a learning method that can learn the best decision rule in an approximation class given a data set of n finite horizon trajectories? Sieve Estimators or Regularized Estimators? Dealing with high dimensional X-- feature extraction--feature selection.
samurphy@umich.edu