Review for NeurIPS paper: Towards Safe Policy Improvement for Non-Stationary MDPs

Jan-25-2025, 06:28:30 GMT–Neural Information Processing Systems

Summary and Contributions: In this paper, the authors introduce a novel model-free, policy improvement-based algorithm, for smooth non-stationary Markov decision processes (NS-MDP), focusing on safety guarantees of their method. The method relies heavily on Assumption 1 (Smooth performance), implicitly assumed in [51], which enables the treatment of the off-policy evaluation (OPE) in the NS-MDP as a time-series forecasting (TSF) problem. The authors introduce safe policy improvement for NS-MDPs, which they term SPIN, which, under Assumption 1, iterates between a policy evaluation step and a policy improvement step. Importance sampling is used for OPE according to past evaluation samples. Then TSF is applied to estimate future performance, while wild bootstrapping is used to obtain uncertainty estimates for future performance.

neurips paper, non-stationary mdp, safe policy improvement, (4 more...)

Neural Information Processing Systems

Jan-25-2025, 06:28:30 GMT

Conferences Web Page

Add feedback

Industry:
- Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.30)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.65)