Posterior Sampling for Competitive RL: Function Approximation and Partial Observation

Jan-18-2025, 07:37:22 GMT–Neural Information Processing Systems

This paper investigates posterior sampling algorithms for competitive reinforcement learning (RL) in the context of general function approximations. Focusing on zero-sum Markov games (MGs) under two critical settings, namely self-play and adversarial learning, we first propose the self-play and adversarial generalized eluder coefficient (GEC) as complexity measures for function approximation, capturing the exploration-exploitation trade-off in MGs. Based on self-play GEC, we propose a model-based self-play posterior sampling method to control both players to learn Nash equilibrium, which can successfully handle the partial observability of states. Furthermore, we identify a set of partially observable MG models fitting MG learning with the adversarial policies of the opponent. Incorporating the adversarial GEC, we propose a model-based posterior sampling method for learning adversarial MG with potential partial observability.

function approximation and partial observation, posterior, posterior sampling, (6 more...)

Neural Information Processing Systems

Jan-18-2025, 07:37:22 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Reinforcement Learning (0.66)
  - Representation & Reasoning > Uncertainty
    - Fuzzy Logic (0.89)