Online Reinforcement Learning in Stochastic Games

Mar-17-2026, 13:31:50 GMT–Neural Information Processing Systems

We study online reinforcement learning in average-reward stochastic games (SGs). An SG models a two-player zero-sum game in a Markov environment, where state transitions and one-step payoffs are determined simultaneously by a learner and an adversary. We propose the \textsc{UCSG} algorithm that achieves a sublinear regret compared to the game value when competing with an arbitrary opponent. This result improves previous ones under the same setting. The regret bound has a dependency on the \textit{diameter}, which is an intrinsic value related to the mixing property of SGs.

artificial intelligence, machine learning, reinforcement learning, (7 more...)

Neural Information Processing Systems

Mar-17-2026, 13:31:50 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.31)