Surrogate Objectives for Batch Policy Optimization in One-step Decision Making

Minmin Chen, Ramki Gummadi, Chris Harris, Dale Schuurmans

Neural Information Processing Systems 

Whenrewardsare fully observed, we show that the expected reward objectiveexhibits suboptimal plateaus and exponentially many local optima in the worst case.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found