Review for NeurIPS paper: Efficient Model-Based Reinforcement Learning through Optimistic Policy Search and Planning
–Neural Information Processing Systems
Summary and Contributions: Conventionally when rollout-based MBRL algorithms apply an optimistic exploration strategy like UCB, aleatoric and epistemic uncertainty are often conflated into a single pointwise measure of uncertainty at each state in the rollout sequence. This submission proposes a novel augmented policy class that explicitly interacts with the model's epistemic uncertainty to hypothesize the best possible outcome for any particular action sequence. In addition to proof-of-concept experiments on easy Mujoco control tasks, the authors provide regret bounds for their exploration strategy applied to purely rollout-based MBRL methods, including a sublinear regret bound for GP dynamics models. My greatest concern with this submission lies with the reproducibility of the results. There is no mention of code, and simple, crucial implementation details are missing.
Neural Information Processing Systems
Jan-27-2025, 05:32:28 GMT
- Technology: