Goto

Collaborating Authors

 Hou, Bairu


TextGrad: Advancing Robustness Evaluation in NLP by Gradient-Driven Optimization

arXiv.org Artificial Intelligence

Robustness evaluation against adversarial examples has become increasingly important to unveil the trustworthiness of the prevailing deep models in natural language processing (NLP). However, in contrast to the computer vision domain where the first-order projected gradient descent (PGD) is used as the benchmark approach to generate adversarial examples for robustness evaluation, there lacks a principled first-order gradient-based robustness evaluation framework in NLP. The emerging optimization challenges lie in 1) the discrete nature of textual inputs together with the strong coupling between the perturbation location and the actual content, and 2) the additional constraint that the perturbed text should be fluent and achieve a low perplexity under a language model. These challenges make the development of PGD-like NLP attacks difficult. To bridge the gap, we propose TextGrad, a new attack generator using gradient-driven optimization, supporting high-accuracy and high-quality assessment of adversarial robustness in NLP. Specifically, we address the aforementioned challenges in a unified optimization framework. And we develop an effective convex relaxation method to co-optimize the continuously-relaxed site selection and perturbation variables and leverage an effective sampling method to establish an accurate mapping from the continuous optimization variables to the discrete textual perturbations. Moreover, as a first-order attack generation method, TextGrad can be baked into adversarial training to further improve the robustness of NLP models. Extensive experiments are provided to demonstrate the effectiveness of TextGrad not only in attack generation for robustness evaluation but also in adversarial defense.


Learning to Attack: Towards Textual Adversarial Attacking in Real-world Situations

arXiv.org Artificial Intelligence

Adversarial attacking aims to fool deep neural networks with adversarial examples. In the field of natural language processing, various textual adversarial attack models have been proposed, varying in the accessibility to the victim model. Among them, the attack models that only require the output of the victim model are more fit for real-world situations of adversarial attacking. However, to achieve high attack performance, these models usually need to query the victim model too many times, which is neither efficient nor viable in practice. To tackle this problem, we propose a reinforcement learning based attack model, which can learn from attack history and launch attacks more efficiently. In experiments, we evaluate our model by attacking several state-of-the-art models on the benchmark datasets of multiple tasks including sentiment analysis, text classification and natural language inference. Experimental results demonstrate that our model consistently achieves both better attack performance and higher efficiency than recently proposed baseline methods. We also find our attack model can bring more robustness improvement to the victim model by adversarial training. All the code and data of this paper will be made public.


OpenAttack: An Open-source Textual Adversarial Attack Toolkit

arXiv.org Artificial Intelligence

OpenAttack has systematic to be susceptible to adversarial attacks (Szegedy modular design, which disassembles many et al., 2014; Goodfellow et al., 2015). The attacker different attack models, extract the common components uses adversarial examples, which are maliciously and wisely recombine them together. More crafted by imposing small perturbations on original importantly, it has following significant features: input, to fool the victim model. With the wide application - Full coverage of attack model types. of DNNs to practical systems accompanied OpenAttack currently includes 12 typical by growing concern about their security, research attack models which cover all the types on adversarial attacking has become increasingly of accessibility to the victim model and important. Moreover, adversarial attacks are also perturbation levels.