AITopics | Do, Phong Nguyen-Thuan

Collaborating Authors

Do, Phong Nguyen-Thuan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

VLUE: A New Benchmark and Multi-task Knowledge Transfer Learning for Vietnamese Natural Language Understanding

Do, Phong Nguyen-Thuan, Tran, Son Quoc, Hoang, Phu Gia, Van Nguyen, Kiet, Nguyen, Ngan Luu-Thuy

arXiv.org Artificial IntelligenceMar-23-2024

The success of Natural Language Understanding (NLU) benchmarks in various languages, such as GLUE for English, CLUE for Chinese, KLUE for Korean, and IndoNLU for Indonesian, has facilitated the evaluation of new NLU models across a wide range of tasks. To establish a standardized set of benchmarks for Vietnamese NLU, we introduce the first Vietnamese Language Understanding Evaluation (VLUE) benchmark. The VLUE benchmark encompasses five datasets covering different NLU tasks, including text classification, span extraction, and natural language understanding. To provide an insightful overview of the current state of Vietnamese NLU, we then evaluate seven state-of-the-art pre-trained models, including both multilingual and Vietnamese monolingual models, on our proposed VLUE benchmark. Furthermore, we present CafeBERT, a new state-of-the-art pre-trained model that achieves superior results across all tasks in the VLUE benchmark. Our model combines the proficiency of a multilingual pre-trained model with Vietnamese linguistic knowledge. CafeBERT is developed based on the XLM-RoBERTa model, with an additional pretraining step utilizing a significant amount of Vietnamese textual data to enhance its adaptation to the Vietnamese language. For the purpose of future research, CafeBERT is made publicly available for research purposes.

artificial intelligence, natural language, nguyen, (16 more...)

arXiv.org Artificial Intelligence

2403.15882

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Philippines > Luzon > National Capital Region > City of Manila (0.14)

Genre: Research Report (1.00)

Industry:

Information Technology (0.69)
Media > News (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Understanding (0.81)

Add feedback

AGent: A Novel Pipeline for Automatically Creating Unanswerable Questions

Tran, Son Quoc, Do, Gia-Huy, Do, Phong Nguyen-Thuan, Kretchmar, Matt, Du, Xinya

arXiv.org Artificial IntelligenceSep-10-2023

The development of large high-quality datasets and high-performing models have led to significant advancements in the domain of Extractive Question Answering (EQA). This progress has sparked considerable interest in exploring unanswerable questions within the EQA domain. Training EQA models with unanswerable questions helps them avoid extracting misleading or incorrect answers for queries that lack valid responses. However, manually annotating unanswerable questions is labor-intensive. To address this, we propose AGent, a novel pipeline that automatically creates new unanswerable questions by re-matching a question with a context that lacks the necessary information for a correct answer. In this paper, we demonstrate the usefulness of this AGent pipeline by creating two sets of unanswerable questions from answerable questions in SQuAD and HotpotQA. These created question sets exhibit low error rates. Additionally, models fine-tuned on these questions show comparable performance with those fine-tuned on the SQuAD 2.0 dataset on multiple EQA benchmarks.

machine learning, natural language, question answering, (18 more...)

arXiv.org Artificial Intelligence

2309.05103

Country:

Europe (1.00)
North America > United States > Florida > Polk County (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.68)

Industry:

Government (0.47)
Information Technology (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Add feedback

Revealing Weaknesses of Vietnamese Language Models Through Unanswerable Questions in Machine Reading Comprehension

Tran, Son Quoc, Do, Phong Nguyen-Thuan, Van Nguyen, Kiet, Nguyen, Ngan Luu-Thuy

arXiv.org Artificial IntelligenceMar-16-2023

Although the curse of multilinguality significantly restricts the language abilities of multilingual models in monolingual settings, researchers now still have to rely on multilingual models to develop state-of-the-art systems in Vietnamese Machine Reading Comprehension. This difficulty in researching is because of the limited number of high-quality works in developing Vietnamese language models. In order to encourage more work in this research field, we present a comprehensive analysis of language weaknesses and strengths of current Vietnamese monolingual models using the downstream task of Machine Reading Comprehension. From the analysis results, we suggest new directions for developing Vietnamese language models. Besides this main contribution, we also successfully reveal the existence of artifacts in Vietnamese Machine Reading Comprehension benchmarks and suggest an urgent need for new high-quality benchmarks to track the progress of Vietnamese Machine Reading Comprehension. Moreover, we also introduced a minor but valuable modification to the process of annotating unanswerable questions for Machine Reading Comprehension from previous work. Our proposed modification helps improve the quality of unanswerable questions to a higher level of difficulty for Machine Reading Comprehension systems to solve.

artificial intelligence, natural language, text processing, (17 more...)

arXiv.org Artificial Intelligence

2303.13355

Country:

Asia > Vietnam (0.69)
Europe > Germany > Berlin (0.28)
North America > United States > Minnesota (0.28)

Genre: Research Report (0.64)

Industry: Education > Assessment & Standards > Student Performance (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)

Add feedback

The Impacts of Unanswerable Questions on the Robustness of Machine Reading Comprehension Models

Tran, Son Quoc, Do, Phong Nguyen-Thuan, Le, Uyen, Kretchmar, Matt

arXiv.org Artificial IntelligenceJan-31-2023

Pretrained language models have achieved super-human performances on many Machine Reading Comprehension (MRC) benchmarks. Nevertheless, their relative inability to defend against adversarial attacks has spurred skepticism about their natural language understanding. In this paper, we ask whether training with unanswerable questions in SQuAD 2.0 can help improve the robustness of MRC models against adversarial attacks. To explore that question, we fine-tune three state-of-theart language models on either SQuAD 1.1 or SQuAD 2.0 and then evaluate their robustness under adversarial attacks. Our experiments reveal Figure 1: Example of predictions to an answerable that current models fine-tuned on SQuAD question of RoBERTa fine-tuned on SQuAD 1.1 (Rajpurkar 2.0 do not initially appear to be any more robust et al., 2016) (v1) versus its counterpart finetuned than ones fine-tuned on SQuAD 1.1, yet on SQuAD 2.0 (Rajpurkar et al., 2018) (v2) under they reveal a measure of hidden robustness that adversarial attack. While RoBERTa v1 predicts can be leveraged to realize actual performance "DartFord" as the answer under attack, RoBERTa v2 gains. Furthermore, we find that the robustness knows that "DartFord" is not the correct answer but of models fine-tuned on SQuAD 2.0 extends fails to focus back on "Nevada", the correct answer to additional out-of-domain datasets. Finally, for the given question. RoBERTa v2 then predicts the we introduce a new adversarial attack tested question as unanswerable.

computational linguistic, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2302.00094

Country:

Europe (1.00)
Asia (0.93)
North America > United States > Minnesota (0.28)
North America > United States > Nevada (0.24)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (1.00)
Education > Assessment & Standards > Student Performance (0.71)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback