Tight Mutual Information Estimation With Contrastive Fenchel-Legendre Optimization, Dong Wang

Neural Information Processing Systems 

Now let us prove InfoNCE is a lower bound to MI and under proper conditions this estimate is tight. Our proof is based on establishing that InfoNCE is a multi-sample extension of the NWJ bound. For completeness, we first repeat the proof for BA and UBA below, and then show UBA leads to NWJ and its multi-sample variant InfoNCE.