Review for NeurIPS paper: Multi-label Contrastive Predictive Coding

Neural Information Processing Systems 

Summary and Contributions: The authors propose a multi-label version of contrastive predictive coding (CPC), which essentially transforms the CPC loss from one that is defined on each positive pair and its set of negative pairs for i 1...n versus a version where all positive pairs (and all possible negatives amongst them) make up the softmax distribution, which can be thought of a'multi-label' classification task where the network is trained to ensure the top n predictions from that distribution are the n positive examples. The motivation behind this technique is that: - (1) The regular CPC loss (which is a lower bound on the mutual information between X and Y) is upper bounded by log(m), where m is the total number of pairs used (i.e. 1 positive pair m-1 negative pairs). If log(m) is much lower than I(X;Y) then we underestimate mutual information. Its disadvantage however is that for some range of alpha the loss will no longer be a lower bound to I(X;Y). In essence, the authors show that for their proposed method (alpha-ML-CPC), one can derive the range of alphas that lower bound I(X;Y) as a function of m and n, and these ranges are quite large even for modest values of m and n.