Review for NeurIPS paper: BAIL: Best-Action Imitation Learning for Batch Deep Reinforcement Learning

Feb-6-2025, 17:53:32 GMT–Neural Information Processing Systems

Summary and Contributions: ---post author response--- Thank you for the response! The clarifications to the table have improved my understanding of the results. While I think that the results are strong, the discussion section is jumbled/unclear, and intuition of some of the design decisions are lacking and give an'ad hoc' impression. Clarifications for this are adequately mentioned in the response, and I will increase my score to a 6 assuming the authors will add these clarifications to the final text, as well as make the experimental results section more more clear. This work proposes a batch deep RL algorithm called BAIL. It essentially trains a policy using imitation learning with samples collected from state-action pairs whose (Monte Carlo) returns are from what the authors define as the upper envelope of the data.

batch deep reinforcement learning, best-action imitation learning, learning, (5 more...)

Neural Information Processing Systems

Feb-6-2025, 17:53:32 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)