Review for NeurIPS paper: MOReL: Model-Based Offline Reinforcement Learning

Feb-8-2025, 10:51:12 GMT–Neural Information Processing Systems

Additional Feedback: Most of recent offline RL algorithms rely on policy regularization where the optimizing policy is prevented from deviating too much from the data-logging policy. Differently, MOReL does not directly rely on the data-logging policy but exploits pessimism to a model-based approach, providing another good direction for offline RL. However, it would be more natural to penalize more to more uncertain states. For example, one classical model-based RL algorithm (MBIE-EB) constructs an optimistic MDP that rewarding the uncertain regions by the bonus proportional to the 1/sqrt(N(s,a)) where N(s,a) is the visitation count. In contrast, but similarly to MBIE-EB, we may consider a pessimistic MDP that penalizes the uncertain regions by the penalty proportional to the 1/sqrt(N(s,a)). How is it justified to use alpha greater than zero for USAD? - It would be great to see how sensitive the performance of the algorithm with respect to kappa in the reward penalty and threshold in USAD.

model-based offline reinforcement learning, offline reinforcement learning, reinforcement learning, (11 more...)

Neural Information Processing Systems

Feb-8-2025, 10:51:12 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)