Optimizing Sequential Medical Treatments with Auto-Encoding Heuristic Search in POMDPs
Li, Luchen, Komorowski, Matthieu, Faisal, Aldo A.
–arXiv.org Artificial Intelligence
Health-related data is noisy and stochastic in implying the true physiological states of patients, limiting information contained in single-moment observations for sequential clinical decision making. We model patient-clinician interactions as partially observable Markov decision processes (POMDPs) and optimize sequential treatment based on belief states inferred from history sequence. To facilitate inference, we build a variational generative model and boost state representation with a recurrent neural network (RNN), incorporating an auxiliary loss from sequence auto-encoding. Meanwhile, we optimize a continuous policy of drug levels with an actor-critic method where policy gradients are obtained from a stablized off-policy estimate of advantage function, with the value of belief state backed up by parallel best-first suffix trees. We exploit our methodology in optimizing dosages of vasopressor and intravenous fluid for sepsis patients using a retrospective intensive care dataset and evaluate the learned policy with off-policy policy evaluation (OPPE). The results demonstrate that modelling as POMDPs yields better performance than MDPs, and that incorporating heuristic search improves sample efficiency.
arXiv.org Artificial Intelligence
May-17-2019
- Country:
- Europe > Switzerland
- North America > United States
- California > San Francisco County
- San Francisco (0.14)
- New York (0.14)
- Virginia (0.14)
- California > San Francisco County
- Genre:
- Research Report > New Finding (0.34)
- Industry: