Off-Policy Policy Gradient with State Distribution Correction

Liu, Yao, Swaminathan, Adith, Agarwal, Alekh, Brunskill, Emma

Apr-17-2019–arXiv.org Artificial Intelligence

The ability to use data about prior decisions and their outcomes to make counterfactual inferences about how alternative decision policies might perform, is a cornerstone of intelligent behavior. It also has immense practical potential - it can enable the use of electronic medical record data to infer better treatment decisions for patients, the use of prior product recommendations to inform more effective strategies for presenting recommendations, and previously collected data from students using educational software to better teach those and future students. Such counterfactual reasoning, particularly when one is deriving decision policies that will be used to make not one but a sequence of decisions, is important since online sampling during a learning procedure is both costly and dangerous, and not practical in many of the applications above. While amply motivated, doing such counterfactual reasoning is also challenging because the data is censored - we can only observe the result of providing a particular chemotherapy treatment policy to a particular patient, not the counterfactual of if we were then to start with a radiation sequence. We focus on the problem of performing such counterfactual inferences in the context of sequential decision making in a Markov decision process (MDP).

algorithm, immunology, optimization problem, (22 more...)

arXiv.org Artificial Intelligence

Apr-17-2019

arXiv.org PDF

Add feedback

Country:
- North America > Canada > Alberta (0.14)

Genre:
- Research Report (1.00)

Industry:
- Health & Medicine
  - Health Care Technology > Medical Record (0.54)
  - Therapeutic Area > Immunology (0.71)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.48)
    - Neural Networks (0.94)
    - Reinforcement Learning (1.00)
  - Representation & Reasoning > Optimization (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found