Review for NeurIPS paper: Deep Inverse Q-learning with Constraints
–Neural Information Processing Systems
Summary and Contributions: [UPDATE] I thank the authors for their response. I agree that the empirical results (for the settings considered in the main paper and the additional results provided in the rebuttal) are convincing/impressive. But in the theoretical/algorithmic front, I'm still not convinced. Especially: [1] lines 4-16 in the rebuttal: I still think that Theorem 1 imposes strong restriction of the class of MDPs (even if it relaxes the restriction on the expert policy distribution): not necessarily all the MDPs should satisfy such condition over long-term Q-value. Consider an example: action space that contains only two actions A {a,b}, state s, and a greed/deterministic expert policy s.t.
Neural Information Processing Systems
Jan-27-2025, 06:46:34 GMT
- Technology: