Large-Scale Adversarial Training for Vision-and-Language Representation Learning: Supplementary Material

Neural Information Processing Systems 

This supplementary material contains three sections. Section A.1 reviews additional related work. Section A.2 provides additional experimental results. Section A.3 describes downstream tasks and implementation details. A.1 Additional Related Work Adversarial Training Many efforts have been devoted to improving AT from different angles: (i) use triplet-wise metric learning [8, 7] and optimal transport [20] to leverage inter-sample interactions; (ii) exploit extra unlabeled training data [12, 1]; and (iii) accelerate the training procedure [11, 19, 14].