Review for NeurIPS paper: Large-Scale Adversarial Training for Vision-and-Language Representation Learning
–Neural Information Processing Systems
Weaknesses: Besides the strength of the paper, I have some concerns about the paper. In this paper, the authors show that by adding adversarial perturbations into the embedding, the model can improve the performance on final downstream tasks. This is great, however, the paper didn't answer whether the proposed method can perform better in the adversarial attack? What is the connection between adding noise in embedding space and pixel/token space? There are multiple ways to test how the proposed method is more robust, for example: - Some downstream tasks focus on paraphrasing, there is a vqa-rephrasing dataset, and I am curious whether injecting the adversarial noise into the embedding space will lead to better performance on this dataset?
Neural Information Processing Systems
Jan-24-2025, 02:27:28 GMT
- Technology: