SemRoDe: Macro Adversarial Training to Learn Representations That are Robust to Word-Level Attacks

Formento, Brian, Feng, Wenjie, Foo, Chuan Sheng, Tuan, Luu Anh, Ng, See-Kiong

Mar-27-2024–arXiv.org Artificial Intelligence

Language models (LMs) are indispensable tools for natural language processing tasks, but their vulnerability to adversarial attacks remains a concern. While current research has explored adversarial training techniques, their improvements to defend against word-level attacks have been limited. In this work, we propose a novel approach called Semantic Robust Defence (SemRoDe), a Macro Adversarial Training strategy to enhance the robustness of LMs. Drawing inspiration from recent studies in the image domain, we investigate and later confirm that in a discrete data setting such as language, adversarial samples generated via word substitutions do indeed belong to an adversarial domain exhibiting a high Wasserstein distance from the base domain. Our method learns a robust representation that bridges these two domains. We hypothesize that if samples were not projected into an adversarial domain, but instead to a domain with minimal shift, it would improve attack robustness. We align the domains by incorporating a new distance-based objective. With this, our model is able to learn more generalized representations by aligning the model's high-level output features and therefore better handling unseen adversarial samples. This method can be generalized across word embeddings, even when they share minimal overlap at both vocabulary and word-substitution levels. To evaluate the effectiveness of our approach, we conduct experiments on BERT and RoBERTa models on three datasets. The results demonstrate promising state-of-the-art robustness.

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

Mar-27-2024

arXiv.org PDF

Add feedback

Country:
- Asia > Middle East
  - Qatar (0.14)
- Europe (0.67)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Government > Military (0.49)
- Information Technology > Security & Privacy (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.46)
  - Natural Language > Text Processing (0.94)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found