Sample-Efficient Learning of Stackelberg Equilibria in General-Sum Games
–Neural Information Processing Systems
Real world applications such as economics and policy making often involve solving multi-agent games with two unique features: (1) The agents are inherently asymmetric and partitioned into leaders and followers; (2) The agents have different reward functions, thus the game is general-sum. The majority of existing results in this field focuses on either symmetric solution concepts (e.g. It remains open how to learn the Stackelberg equilibrium---an asymmetric analog of the Nash equilibrium---in general-sum games efficiently from noisy samples. This paper initiates the theoretical study of sample-efficient learning of the Stackelberg equilibrium, in the bandit feedback setting where we only observe noisy samples of the reward. We consider three representative two-player general-sum games: bandit games, bandit-reinforcement learning (bandit-RL) games, and linear bandit games.
Neural Information Processing Systems
Jan-19-2025, 09:01:50 GMT
- Technology: