Review for NeurIPS paper: Weighted QMIX: Expanding Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

Neural Information Processing Systems 

The proof of your theory lacks discussion of POMDP settings. Although the framework in focused in solving the Dec-POMDP problem, most parts of the proof are under MDP setting. But there is no more discussion on that phenomenon. The use of weighting is not that convinced. In Section 6.2.3, the performance of the Weighted QMIX method is unacceptable.