Surrogate Objectives for Batch Policy Optimization in One-step Decision Making
Minmin Chen, Ramki Gummadi, Chris Harris, Dale Schuurmans
–Neural Information Processing Systems
Whenrewardsare fully observed, we show that the expected reward objectiveexhibits suboptimal plateaus and exponentially many local optima in the worst case.
Neural Information Processing Systems
Feb-12-2026, 19:23:21 GMT