Multi-label Contrastive Predictive Coding

Neural Information Processing Systems 

Variational mutual information (MI) estimators are widely used in unsupervised representation learning methods such as contrastive predictive coding (CPC). A lower bound on MI can be obtained from a multi-class classification problem, where a critic attempts to distinguish a positive sample drawn from the underlying joint distribution from (m-1) negative samples drawn from a suitable proposal distribution. Using this approach, MI estimates are bounded above by \log m, and could thus severely underestimate unless m is very large. To overcome this limitation, we introduce a novel estimator based on a multi-label classification problem, where the critic needs to jointly identify \emph{multiple} positive samples at the same time. We show that using the same amount of negative samples, multi-label CPC is able to exceed the \log m bound, while still being a valid lower bound of mutual information.