Reviews: Near-Optimal Reinforcement Learning in Dynamic Treatment Regimes
–Neural Information Processing Systems
In this paper, the authors provide a method for incorporating observational data (possibly subject to unobserved confounding) to improve the performance of policy learning in online settings (crucial theorems are 5,7 and 8). After a period of discussion, the reviewers came to a consensus that this paper merits publication in NeurIPS, and will contribute to the RL literature by giving a principled method of incorporating observational data, even if confounded.
Neural Information Processing Systems
Jan-25-2025, 04:37:25 GMT
- Technology: