A Finite-Sample Analysis of Payoff-Based Independent Learning in Zero-Sum Stochastic Games
–Neural Information Processing Systems
In this work, we study two-player zero-sum stochastic games and develop a variant of the smoothed best-response learning dynamics that combines independent learning dynamics for matrix games with the minimax value iteration for stochastic games. The resulting learning dynamics are payoff-based, convergent, rational, and symmetric between the two players.
Neural Information Processing Systems
Oct-6-2024, 07:18:16 GMT
- Country:
- North America > United States (1.00)
- Genre:
- Overview (0.45)
- Industry:
- Leisure & Entertainment > Games (0.67)
- Technology: