Large-Scale Adversarial Training for Vision-and-Language Representation Learning
–Neural Information Processing Systems
Instead of adding adversarial perturbations on image pixels and textual tokens, we propose to perform adversarial training in the embedding space of each modality. To enable large-scale training, we adopt the "free" adversarial training strategy, and combine it with KL-divergence-based regularization to promote higher invariance in the embedding space.
Neural Information Processing Systems
May-29-2025, 05:52:43 GMT