Reward Transfer from Inverse Reinforcement Learning: A Coupled Minimax Approach

Hao, Guang-Yuan, van der Laan, Lars, Bibaut, Aurélien, Kallus, Nathan

May-28-2026–arXiv.org Machine Learning

Expert demonstrations, such as those from car drivers, help navigate environments with unknown rewards, but are often collected in controlled settings, such as closed-course test tracks, while learned control policies must be deployed in new environments, such as city streets. We can imitate experts to perform well in the same source environment where demonstrations are observed, and we may even use inverse reinforcement learning (IRL) to improve on simple behavior cloning (Ng and Russell, 2000; Abbeel and Ng, 2004; Ziebart et al., 2008; Fu et al., 2018; Geng et al., 2020). But the target environment may have a different transition law, discount factor, or soft-control regularization. For this, IRL is crucial: we can learn a reward from demonstrations in the source environment and transfer it to the target environment, learning a policy that optimizes the same reward function in a new setting (Fu et al., 2018; Schlaginhaufen and Kamgarpour, 2024). In this paper, we characterize how well this transfer can be done and which approaches are preferable. In particular, we show the value in a coupled approach that takes the target environment into account even when learning from the source. In ordinary offline control, the Bellman equation uses a known reward, so the main statistical error comes from target transitions.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

May-28-2026

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.63)

Industry:
- Health & Medicine (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found