on a memory economical calculation, while its vanilla multi-key counterpart is less memory efficient when achieving

Neural Information Processing Systems 

Thank you for acknowledging the key contributions of our paper. R1.2 Generalize to video: As suggested, we conducted additional The top-1 accuracy of JCL pre-trained features is 48.6%, which outperforms MoCo v2 (47.3%). Generalization of JCL for other data modalities (sound, language, video) will be included in our future work. Regarding your concerns of the written quality and typos (e.g., Algorithm 1 The top-1 accuracy on ImageNet100 for vanilla (ResNet-50) is 80.9% while JCL achieves 82.0%. R2.3 SimCLR: The top-5 accuracy we reported (87.3%) for SimCLR was extracted from the Thus, there is no one-one correspondence between the data in Table1 and Figure2.