Batch Policy Learning in Average Reward Markov Decision Processes

Liao, Peng, Qi, Zhengling, Murphy, Susan

Jul-22-2020–arXiv.org Machine Learning

We study the problem of policy optimization in Markov Decision Process over infinite time horizons (Puterman, 1994). We focus on the batch (i.e., off-line) setting, where historical data of multiple trajectories has been previously collected using some behavior policy. Our goal is to learn a new policy with guaranteed performance when implemented in the future. In this work, we develop a data-efficient method to learn the policy that optimizes the long-term average reward in a pre-specified policy class from a training set composed of multiple trajectories. Furthermore, we establish a finite-sample regret guarantee, i.e., the difference between the average reward of the optimal policy in the class and the average reward of the estimated policy by our proposed method. This work is motivated by the development of justin-time adaptive intervention in mobile health (mHealth) applications (Nahum-Shani et al., 2017). Our method can be used to learn a treatment policy that maps the real-time collected information about the individual's status and context to a particular treatment at each of many decision times to support health behaviors.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

Jul-22-2020

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Massachusetts > Middlesex County > Belmont (0.04)
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.04)

Genre:
- Research Report (0.82)

Industry:
- Health & Medicine (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Optimization (0.94)
  - Machine Learning
    - Statistical Learning (1.00)
    - Reinforcement Learning (0.94)
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.85)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found