Paths to Equilibrium in Normal-Form Games
Yongacoglu, Bora, Arslan, Gürdal, Pavel, Lacra, Yüksel, Serdar
–arXiv.org Artificial Intelligence
In multi-agent reinforcement learning (MARL), agents repeatedly interact across time and revise their strategies as new data arrives, producing a sequence of strategy profiles. This paper studies sequences of strategies satisfying a pairwise constraint inspired by policy updating in reinforcement learning, where an agent who is best responding in period $t$ does not switch its strategy in the next period $t+1$. This constraint merely requires that optimizing agents do not switch strategies, but does not constrain the other non-optimizing agents in any way, and thus allows for exploration. Sequences with this property are called satisficing paths, and arise naturally in many MARL algorithms. A fundamental question about strategic dynamics is such: for a given game and initial strategy profile, is it always possible to construct a satisficing path that terminates at an equilibrium strategy? The resolution of this question has implications about the capabilities or limitations of a class of MARL algorithms. We answer this question in the affirmative for mixed extensions of finite normal-form games.%
arXiv.org Artificial Intelligence
Mar-26-2024
- Country:
- Europe
- Greece > Attica
- Athens (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Greece > Attica
- North America
- Canada > Ontario
- Toronto (0.14)
- United States > Hawaii (0.04)
- Canada > Ontario
- Europe
- Genre:
- Research Report (1.00)
- Technology: