Offline Learning in Markov Games with General Function Approximation
Zhang, Yuheng, Bai, Yu, Jiang, Nan
–arXiv.org Artificial Intelligence
Offline RL aims to learn a good policy from a pre-collected historical dataset. It has emerged as an important paradigm for bringing RL to real-life scenarios due to its non-interative nature, especially in applications where deploying adaptive algorithms in the real system is financially costly and/or ethically problematic [Levine et al., 2020]. While offline RL has been extensively studied in the single-agent setting, many real-world applications involve the strategic interactions between multiple agents. This renders the necessity of bringing in game-theoretic reasoning, often modeled using Markov games [Shapley, 1953] in the RL theory literature. Markov games can be viewed as the multi-agent extension of Markov Decision Processes (MDPs), where agents share the same state information and the dynamics is determined by the joint action of all agents. While online RL in Markov games has seen significant developments in recent years [Bai and Jin, 2020, Liu et al., 2021, Song et al., 2021, Jin et al., 2021b], offline learning in Markov games has only started to attract attention from the community. Earlier works [Cui and Du, 2022b, Zhong et al., 2022] focus on tabular cases or linear function approximation, which cannot handle complex environments that require advanced function-approximation techniques. Although there has been a rich literature on single-agent RL with general function approximation [Jiang et al., 2017, Jin et al., 2021a, Wang et al., 2020, Huang et al., 2021a], whether and how they can be extended to offline Markov games remains largely unclear.
arXiv.org Artificial Intelligence
Feb-6-2023
- Country:
- North America > United States
- Illinois > Champaign County > Urbana (0.04)
- Asia > Middle East
- Jordan (0.04)
- North America > United States
- Genre:
- Research Report (0.50)
- Technology: