POPCORN: Partially Observed Prediction COnstrained ReiNforcement Learning
Futoma, Joseph, Hughes, Michael C., Doshi-Velez, Finale
Many medical decision-making settings can be framed as partially observed Markov decision processes (POMDPs). However, popular two-stage approaches that first learn a POMDP model and then solve it often fail because the model that best fits the data may not be the best model for planning. We introduce a new optimization objective that (a) produces both high-performing policies and high-quality generative models, even when some observations are irrelevant for planning, and (b) does so in the kinds of batch, off-policy settings common in medicine. We demonstrate our approach on synthetic examples and a real-world hypotension management task.
Jan-12-2020
- Country:
- North America > United States
- Massachusetts > Hampshire County
- Amherst (0.04)
- Illinois > Cook County
- Chicago (0.04)
- Massachusetts > Hampshire County
- Europe > Italy
- North America > United States
- Genre:
- Research Report (1.00)
- Industry: