QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning