Appendix for Softmax Deep Double Deterministic Policy Gradients Ling Pan