On Feasible Rewards in Multi-agent Inverse Reinforcement Learning
–Neural Information Processing Systems
Multi-agent Inverse Reinforcement Learning (MAIRL) aims to recover agent reward functions from expert demonstrations. We characterize the feasible reward set in Markov games, identifying all reward functions that rationalize a given equilibrium. However, equilibrium-based observations are often ambiguous: a single Nash equilibrium can correspond to many reward structures, potentially changing the game's nature in multi-agent systems. We address this by introducing entropyregularized Markov games, which yield a unique equilibrium while preserving strategic incentives. For this setting, we provide a sample complexity analysis detailing how errors affect learned policy performance. Our work establishes theoretical foundations and practical insights for MAIRL.
Neural Information Processing Systems
Jun-21-2026, 07:33:02 GMT
- Country:
- North America > United States (0.93)
- Genre:
- Research Report > Experimental Study (1.00)
- Industry:
- Leisure & Entertainment > Games (0.92)
- Technology: