Goto

Collaborating Authors

 Markov Models


FACMAC: Factored Multi-Agent Centralised Policy Gradients Bei Peng University of Liverpool T abish Rashid University of Oxford Christian A. Schroeder de Witt

Neural Information Processing Systems

However, unlike QMIX, there are no inherent constraints on factoring the critic. We thus also employ a nonmonotonic factorisation and empirically demonstrate that its increased representational capacity allows it to solve some tasks that cannot be solved with monolithic, or monotonically factored critics.










A Near-Optimal Primal-Dual Method for Off-Policy Learning in CMDP

Neural Information Processing Systems

As an important framework for safe Reinforcement Learning, the Constrained Markov Decision Process (CMDP) has been extensively studied in the recent literature. However, despite the rich results under various on-policy learning settings, there still lacks some essential understanding of the offline CMDP problems, in terms of both the algorithm design and the information theoretic sample complexity lower bound. In this paper, we focus on solving the CMDP problems where only offline data are available.