Appendix ARemindersaboutintegralprobabilitymetrics
–Neural Information Processing Systems
Let (X,Σ) be a measurable space. To answer question (3), we conduct a thorough ablation study on MOPO. The main goal of the ablation study istounderstand howthe choice ofreward penalty affects performance. Note that we includetrue pen. to indicate the upper bound of our approach. Both reward penalties achievesignificantly better performances than noreward penalty, indicating that it is imperative to consider model uncertainty in batch model-based RL.
Neural Information Processing Systems
Feb-19-2026, 05:45:39 GMT
- Country:
- North America > United States (0.05)