Review for NeurIPS paper: Inverse Reinforcement Learning from a Gradient-based Learner
–Neural Information Processing Systems
Weaknesses: I have several concerns about the proposed approach. First, the empirical results give mixed messages. In one out of three tasks (i.e., reacher), the LfL baseline significantly outperforms LOGEL (Figure 4, left). Whereas for another task (i.e., hopper), the policy trained with the reward function recovered by LOGEL outperforms the policy trained on the true reward function. And what kind of reward function does the LfL baseline recover for the hopper task, that leads to no learning at all?
Neural Information Processing Systems
Jan-22-2025, 04:55:52 GMT
- Technology: