A Appendix of Proofs 1 A.1 Proof of Thm.3.2

Neural Information Processing Systems 

Eqn. 3) is equivalent to optimizing CL (InfoNCE, cf. To complete the proof, we start with giving some important notations and theorem. Here we simply disregard the constant term present in Eqn. 4 as it does not impact optimization, and From the Thm.3.2, we have the equivalence between InfoNCE and CL-DRO. By using McDiarmid's inequality in Thm A.4,for any ϵ, we have: While Corollary 3.4 has already been proven in [ We start with introducing a useful lemma. Then the CL-DRO objective is the tight variational estimation of ϕ -divergence.