Supplementary Material For Stochastic Multiple Target Sampling Gradient Descent

Neural Information Processing Systems 

This consists of the following sections: Appendix 1 contains the proofs and derivations of our theory development. As a consequence, we obtain the conclusion of Equation (1). By choosing u to be a one hot vector at i, we obtain the conclusion of Lemma 1. 1.3 Derivations for the matrix U's formulation in Equation (3) We have ϕ As a consequence, we obtain the conclusion of Equation (3). 3 1.4 Proof of Theorem 2 Before proving this theorem, let us re-state it: We have for all i = 1,...,K that D In this experiment, the three target distributions are created as presented in the main paper. Results are averaged over 5 runs. We take the best checkpoint in each approach based on the validation score.