Deterministic Uncertainty Propagation for Improved Model-Based Offline Reinforcement Learning

Neural Information Processing Systems 

We present a theoretical result demonstrating the strong dependency of suboptimality on the number of Monte Carlo samples taken per Bellman target calculation.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found