The Impacts of Unanswerable Questions on the Robustness of Machine Reading Comprehension Models
Tran, Son Quoc, Do, Phong Nguyen-Thuan, Le, Uyen, Kretchmar, Matt
–arXiv.org Artificial Intelligence
Pretrained language models have achieved super-human performances on many Machine Reading Comprehension (MRC) benchmarks. Nevertheless, their relative inability to defend against adversarial attacks has spurred skepticism about their natural language understanding. In this paper, we ask whether training with unanswerable questions in SQuAD 2.0 can help improve the robustness of MRC models against adversarial attacks. To explore that question, we fine-tune three state-of-theart language models on either SQuAD 1.1 or SQuAD 2.0 and then evaluate their robustness under adversarial attacks. Our experiments reveal Figure 1: Example of predictions to an answerable that current models fine-tuned on SQuAD question of RoBERTa fine-tuned on SQuAD 1.1 (Rajpurkar 2.0 do not initially appear to be any more robust et al., 2016) (v1) versus its counterpart finetuned than ones fine-tuned on SQuAD 1.1, yet on SQuAD 2.0 (Rajpurkar et al., 2018) (v2) under they reveal a measure of hidden robustness that adversarial attack. While RoBERTa v1 predicts can be leveraged to realize actual performance "DartFord" as the answer under attack, RoBERTa v2 gains. Furthermore, we find that the robustness knows that "DartFord" is not the correct answer but of models fine-tuned on SQuAD 2.0 extends fails to focus back on "Nevada", the correct answer to additional out-of-domain datasets. Finally, for the given question. RoBERTa v2 then predicts the we introduce a new adversarial attack tested question as unanswerable.
arXiv.org Artificial Intelligence
Jan-31-2023
- Country:
- Asia (0.93)
- Europe (1.00)
- North America > United States
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Education > Assessment & Standards
- Student Performance (0.71)
- Government > Military (1.00)
- Information Technology > Security & Privacy (1.00)
- Education > Assessment & Standards
- Technology: