Do, Phong Nguyen-Thuan
VLUE: A New Benchmark and Multi-task Knowledge Transfer Learning for Vietnamese Natural Language Understanding
Do, Phong Nguyen-Thuan, Tran, Son Quoc, Hoang, Phu Gia, Van Nguyen, Kiet, Nguyen, Ngan Luu-Thuy
The success of Natural Language Understanding (NLU) benchmarks in various languages, such as GLUE for English, CLUE for Chinese, KLUE for Korean, and IndoNLU for Indonesian, has facilitated the evaluation of new NLU models across a wide range of tasks. To establish a standardized set of benchmarks for Vietnamese NLU, we introduce the first Vietnamese Language Understanding Evaluation (VLUE) benchmark. The VLUE benchmark encompasses five datasets covering different NLU tasks, including text classification, span extraction, and natural language understanding. To provide an insightful overview of the current state of Vietnamese NLU, we then evaluate seven state-of-the-art pre-trained models, including both multilingual and Vietnamese monolingual models, on our proposed VLUE benchmark. Furthermore, we present CafeBERT, a new state-of-the-art pre-trained model that achieves superior results across all tasks in the VLUE benchmark. Our model combines the proficiency of a multilingual pre-trained model with Vietnamese linguistic knowledge. CafeBERT is developed based on the XLM-RoBERTa model, with an additional pretraining step utilizing a significant amount of Vietnamese textual data to enhance its adaptation to the Vietnamese language. For the purpose of future research, CafeBERT is made publicly available for research purposes.
AGent: A Novel Pipeline for Automatically Creating Unanswerable Questions
Tran, Son Quoc, Do, Gia-Huy, Do, Phong Nguyen-Thuan, Kretchmar, Matt, Du, Xinya
The development of large high-quality datasets and high-performing models have led to significant advancements in the domain of Extractive Question Answering (EQA). This progress has sparked considerable interest in exploring unanswerable questions within the EQA domain. Training EQA models with unanswerable questions helps them avoid extracting misleading or incorrect answers for queries that lack valid responses. However, manually annotating unanswerable questions is labor-intensive. To address this, we propose AGent, a novel pipeline that automatically creates new unanswerable questions by re-matching a question with a context that lacks the necessary information for a correct answer. In this paper, we demonstrate the usefulness of this AGent pipeline by creating two sets of unanswerable questions from answerable questions in SQuAD and HotpotQA. These created question sets exhibit low error rates. Additionally, models fine-tuned on these questions show comparable performance with those fine-tuned on the SQuAD 2.0 dataset on multiple EQA benchmarks.
Revealing Weaknesses of Vietnamese Language Models Through Unanswerable Questions in Machine Reading Comprehension
Tran, Son Quoc, Do, Phong Nguyen-Thuan, Van Nguyen, Kiet, Nguyen, Ngan Luu-Thuy
Although the curse of multilinguality significantly restricts the language abilities of multilingual models in monolingual settings, researchers now still have to rely on multilingual models to develop state-of-the-art systems in Vietnamese Machine Reading Comprehension. This difficulty in researching is because of the limited number of high-quality works in developing Vietnamese language models. In order to encourage more work in this research field, we present a comprehensive analysis of language weaknesses and strengths of current Vietnamese monolingual models using the downstream task of Machine Reading Comprehension. From the analysis results, we suggest new directions for developing Vietnamese language models. Besides this main contribution, we also successfully reveal the existence of artifacts in Vietnamese Machine Reading Comprehension benchmarks and suggest an urgent need for new high-quality benchmarks to track the progress of Vietnamese Machine Reading Comprehension. Moreover, we also introduced a minor but valuable modification to the process of annotating unanswerable questions for Machine Reading Comprehension from previous work. Our proposed modification helps improve the quality of unanswerable questions to a higher level of difficulty for Machine Reading Comprehension systems to solve.
The Impacts of Unanswerable Questions on the Robustness of Machine Reading Comprehension Models
Tran, Son Quoc, Do, Phong Nguyen-Thuan, Le, Uyen, Kretchmar, Matt
Pretrained language models have achieved super-human performances on many Machine Reading Comprehension (MRC) benchmarks. Nevertheless, their relative inability to defend against adversarial attacks has spurred skepticism about their natural language understanding. In this paper, we ask whether training with unanswerable questions in SQuAD 2.0 can help improve the robustness of MRC models against adversarial attacks. To explore that question, we fine-tune three state-of-theart language models on either SQuAD 1.1 or SQuAD 2.0 and then evaluate their robustness under adversarial attacks. Our experiments reveal Figure 1: Example of predictions to an answerable that current models fine-tuned on SQuAD question of RoBERTa fine-tuned on SQuAD 1.1 (Rajpurkar 2.0 do not initially appear to be any more robust et al., 2016) (v1) versus its counterpart finetuned than ones fine-tuned on SQuAD 1.1, yet on SQuAD 2.0 (Rajpurkar et al., 2018) (v2) under they reveal a measure of hidden robustness that adversarial attack. While RoBERTa v1 predicts can be leveraged to realize actual performance "DartFord" as the answer under attack, RoBERTa v2 gains. Furthermore, we find that the robustness knows that "DartFord" is not the correct answer but of models fine-tuned on SQuAD 2.0 extends fails to focus back on "Nevada", the correct answer to additional out-of-domain datasets. Finally, for the given question. RoBERTa v2 then predicts the we introduce a new adversarial attack tested question as unanswerable.