Appendix ARemindersaboutintegralprobabilitymetrics

Neural Information Processing Systems 

Let (X,Σ) be a measurable space. To answer question (3), we conduct a thorough ablation study on MOPO. The main goal of the ablation study istounderstand howthe choice ofreward penalty affects performance. Note that we includetrue pen. to indicate the upper bound of our approach. Both reward penalties achievesignificantly better performances than noreward penalty, indicating that it is imperative to consider model uncertainty in batch model-based RL.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found