solid [ R1, R3, R4 ], our experimental results valuable [ R2, R3, R4] and our paper well-written [ R1, R3, R4]
–Neural Information Processing Systems
We only included a single environment (Pusher-v2) in the main paper in order to save space. We will include the suggested references into the paper. See also About multi-step rollouts . The reviewer suggests that the paper should first "show that minimizing the TD-error is not Notice, however, that despite being commonly used and thought of as "intuitive", Furthermore, Figure 1 shows indeed that minimizing the TD-error can lead to a critic being far away from the ideal one. We did not write that "model-based RL has no advantage in terms of sample-efficiency than model-free RL".
Neural Information Processing Systems
Dec-27-2025, 16:32:53 GMT