Appendix for Regularized Softmax Deep Multi-Agent Q-Learning Ling Pan