Review for NeurIPS paper: BAIL: Best-Action Imitation Learning for Batch Deep Reinforcement Learning

Neural Information Processing Systems 

Summary and Contributions: ---post author response--- Thank you for the response! The clarifications to the table have improved my understanding of the results. While I think that the results are strong, the discussion section is jumbled/unclear, and intuition of some of the design decisions are lacking and give an'ad hoc' impression. Clarifications for this are adequately mentioned in the response, and I will increase my score to a 6 assuming the authors will add these clarifications to the final text, as well as make the experimental results section more more clear. This work proposes a batch deep RL algorithm called BAIL. It essentially trains a policy using imitation learning with samples collected from state-action pairs whose (Monte Carlo) returns are from what the authors define as the upper envelope of the data.