Reviews: Online Reinforcement Learning in Stochastic Games

Oct-7-2024, 18:01:04 GMT–Neural Information Processing Systems

The paper considers the problem of online learning in two-player zero-sum stochastic games. The main result is constructing a strategy for player 1 that guarantees that the cumulative rewards will never go below the maximin value of the game by more than a certain bound, no matter what strategy the other player follows. The bound is shown to grow sublinearly in the number of rounds T of the game, and polynomially on other problem parameters such as the diameter, the size of the state and action spaces. The results imply that the proposed algorithm can be used in self-play to compute near-maximin strategies for both players. The algorithm and the analysis are largely based on the UCRL algorithm of Auer and Ortner (2007) and the analysis thereof.

algorithm, online reinforcement learning, stochastic game, (1 more...)

Neural Information Processing Systems

Oct-7-2024, 18:01:04 GMT

Conferences Web Page

Add feedback

Genre:
- Instructional Material > Online (0.40)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)