Inverse Reinforcement Learning with the Average Reward Criterion

Jan-20-2025, 00:03:38 GMT–Neural Information Processing Systems

We study the problem of Inverse Reinforcement Learning (IRL) with an average-reward criterion. The goal is to recover an unknown policy and a reward function when the agent only has samples of states and actions from an experienced agent. Previous IRL methods assume that the expert is trained in a discounted environment, and the discount factor is known. We develop novel stochastic first-order methods to solve the IRL problem under the average-reward setting, which requires solving an Average-reward Markov Decision Process (AMDP) as a subproblem. To solve the subproblem, we develop a Stochastic Policy Mirror Descent (SPMD) method under general state and action spaces that needs \mathcal{O}(1/\varepsilon) steps of gradient computation.

average reward criterion, inverse reinforcement learning, policy mirror descent, (5 more...)

Neural Information Processing Systems

Jan-20-2025, 00:03:38 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)