Reviews: Compatible Reward Inverse Reinforcement Learning

Oct-8-2024, 11:52:01 GMT–Neural Information Processing Systems

This paper proposes an approach for behavioral cloning that constructs a function space for a particular parametric policy model based on the null space of the policy gradient. I think a running example (e.g., for discrete MDP) would help explain the approach. I found myself flipping back and forth from the Algorithm (page 6) to the description of each step. I have some lingering confusion about using Eq. I assume a similar estimator is employed for d(s,a).

compatible reward inverse reinforcement learning, feature representation, policy model, (5 more...)

Neural Information Processing Systems

Oct-8-2024, 11:52:01 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.85)