Review for NeurIPS paper: Deep Inverse Q-learning with Constraints

Jan-27-2025, 06:46:34 GMT–Neural Information Processing Systems

Summary and Contributions: [UPDATE] I thank the authors for their response. I agree that the empirical results (for the settings considered in the main paper and the additional results provided in the rebuttal) are convincing/impressive. But in the theoretical/algorithmic front, I'm still not convinced. Especially: [1] lines 4-16 in the rebuttal: I still think that Theorem 1 imposes strong restriction of the class of MDPs (even if it relaxes the restriction on the expert policy distribution): not necessarily all the MDPs should satisfy such condition over long-term Q-value. Consider an example: action space that contains only two actions A {a,b}, state s, and a greed/deterministic expert policy s.t.

algorithm, deep inverse q-learning, neurips paper, (6 more...)

Neural Information Processing Systems

Jan-27-2025, 06:46:34 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)