Review for NeurIPS paper: Reinforcement Learning in Factored MDPs: Oracle-Efficient Algorithms and Tighter Regret Bounds for the Non-Episodic Setting

Feb-6-2025, 16:23:02 GMT–Neural Information Processing Systems

Weaknesses: I have some concerns and questions: - In order to come up with an efficiently-implementable algorithm, for DORL the authors construct an optimistic MDP following a very simple construction. This construction only considers error bounds and completely ignores the value function. So, while the proof claims the optimism is guaranteed, I believe that the resulting optimistic MDP is overly-optimsitic, and to favor computational efficiency, this way one may sacrifice learning efficiency to a large extent. Indeed, the idea of optimizing over a finite set of MDPs (in lieu of the bounded-parameter MDP) is nice. However, I believe the current construction that completely ignores the value function is too naive to work in practice.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

Neural Information Processing Systems

Feb-6-2025, 16:23:02 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)