AITopics | Nguyen, Luan Thanh

Collaborating Authors

Nguyen, Luan Thanh

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Multi-Dialect Vietnamese: Task, Dataset, Baseline Models and Challenges

Van Dinh, Nguyen, Dang, Thanh Chi, Nguyen, Luan Thanh, Van Nguyen, Kiet

arXiv.org Artificial IntelligenceOct-4-2024

Vietnamese, a low-resource language, is typically categorized into three primary dialect groups that belong to Northern, Central, and Southern Vietnam. However, each province within these regions exhibits its own distinct pronunciation variations. Despite the existence of various speech recognition datasets, none of them has provided a fine-grained classification of the 63 dialects specific to individual provinces of Vietnam. To address this gap, we introduce Vietnamese Multi-Dialect (ViMD) dataset, a novel comprehensive dataset capturing the rich diversity of 63 provincial dialects spoken across Vietnam. Our dataset comprises 102.56 hours of audio, consisting of approximately 19,000 utterances, and the associated transcripts contain over 1.2 million words. To provide benchmarks and simultaneously demonstrate the challenges of our dataset, we fine-tune state-of-the-art pre-trained models for two downstream tasks: (1) Dialect identification and (2) Speech recognition. The empirical results suggest two implications including the influence of geographical factors on dialects, and the constraints of current approaches in speech recognition tasks involving multi-dialect speech data. Our dataset is available for research purposes.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2410.03458

Country:

Asia > Vietnam > Hanoi > Hanoi (0.14)
Asia > Vietnam > Bà Rịa-Vũng Tàu Province (0.14)

Genre: Research Report > New Finding (0.66)

Industry: Transportation > Ground > Road (0.46)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

ViHateT5: Enhancing Hate Speech Detection in Vietnamese With A Unified Text-to-Text Transformer Model

Nguyen, Luan Thanh

arXiv.org Artificial IntelligenceJun-4-2024

Recent advancements in hate speech detection (HSD) in Vietnamese have made significant progress, primarily attributed to the emergence of transformer-based pre-trained language models, particularly those built on the BERT architecture. However, the necessity for specialized fine-tuned models has resulted in the complexity and fragmentation of developing a multitasking HSD system. Moreover, most current methodologies focus on fine-tuning general pre-trained models, primarily trained on formal textual datasets like Wikipedia, which may not accurately capture human behavior on online platforms. In this research, we introduce ViHateT5, a T5-based model pre-trained on our proposed large-scale domain-specific dataset named VOZ-HSD. By harnessing the power of a text-to-text architecture, ViHateT5 can tackle multiple tasks using a unified model and achieve state-of-the-art performance across all standard HSD benchmarks in Vietnamese. Our experiments also underscore the significance of label distribution in pre-training data on model efficacy. We provide our experimental materials for research purposes, including the VOZ-HSD dataset, pre-trained checkpoint, the unified HSD-multitask ViHateT5 model, and related source code on GitHub publicly.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2405.14141

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Philippines > Luzon > National Capital Region > City of Manila (0.14)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.46)

Industry:

Information Technology (0.46)
Media > News (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)

Add feedback

Gendec: A Machine Learning-based Framework for Gender Detection from Japanese Names

Pham, Duong Tien, Nguyen, Luan Thanh

arXiv.org Artificial IntelligenceNov-18-2023

Every human has their own name, a fundamental aspect of their identity and cultural heritage. The name often conveys a wealth of information, including details about an individual's background, ethnicity, and, especially, their gender. By detecting gender through the analysis of names, researchers can unlock valuable insights into linguistic patterns and cultural norms, which can be applied to practical applications. Hence, this work presents a novel dataset for Japanese name gender detection comprising 64,139 full names in romaji, hiragana, and kanji forms, along with their biological genders. Moreover, we propose Gendec, a framework for gender detection from Japanese names that leverages diverse approaches, including traditional machine learning techniques or cutting-edge transfer learning models, to predict the gender associated with Japanese names accurately. Through a thorough investigation, the proposed framework is expected to be effective and serve potential applications in various domains.

artificial intelligence, gender detection, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2311.11001

Country:

Europe (0.68)
Asia > Vietnam (0.15)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)

Add feedback

VLSP 2021 - ViMRC Challenge: Vietnamese Machine Reading Comprehension

Van Nguyen, Kiet, Tran, Son Quoc, Nguyen, Luan Thanh, Van Huynh, Tin, Luu, Son T., Nguyen, Ngan Luu-Thuy

arXiv.org Artificial IntelligenceApr-4-2022

One of the emerging research trends in natural language understanding is machine reading comprehension (MRC) which is the task to find answers to human questions based on textual data. Existing Vietnamese datasets for MRC research concentrate solely on answerable questions. However, in reality, questions can be unanswerable for which the correct answer is not stated in the given textual data. To address the weakness, we provide the research community with a benchmark dataset named UIT-ViQuAD 2.0 for evaluating the MRC task and question answering systems for the Vietnamese language. We use UIT-ViQuAD 2.0 as a benchmark dataset for the challenge on Vietnamese MRC at the Eighth Workshop on Vietnamese Language and Speech Processing (VLSP 2021). This task attracted 77 participant teams from 34 universities and other organizations. In this article, we present details of the organization of the challenge, an overview of the methods employed by shared-task participants, and the results. The highest performances are 77.24% in F1-score and 67.43% in Exact Match on the private test set. The Vietnamese MRC systems proposed by the top 3 teams use XLM-RoBERTa, a powerful pre-trained language model based on the transformer architecture. The UIT-ViQuAD 2.0 dataset motivates researchers to further explore the Vietnamese machine reading comprehension task and related tasks such as question answering, question generation, and natural language inference.

machine learning, natural language, question answering, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.25073/2588-1086/vnucsce.340

2203.114

Country:

Europe (1.00)
Asia > Vietnam (0.68)
North America (0.68)

Genre: Research Report (0.82)

Industry: Education > Assessment & Standards > Student Performance (0.95)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Constructive and Toxic Speech Detection for Open-domain Social Media Comments in Vietnamese

Nguyen, Luan Thanh, Van Nguyen, Kiet, Nguyen, Ngan Luu-Thuy

arXiv.org Artificial IntelligenceMar-24-2021

The rise of social media has led to the increasing of comments on online forums. However, there still exists some invalid comments which were not informative for users. Moreover, those comments are also quite toxic and harmful to people. In this paper, we create a dataset for classifying constructive and toxic speech detection, named UIT-ViCTSD (Vietnamese Constructive and Toxic Speech Detection dataset) with 10,000 human-annotated comments. For these tasks, we proposed a system for constructive and toxic speech detection with the state-of-the-art transfer learning model in Vietnamese NLP as PhoBERT. With this system, we achieved 78.59% and 59.40% F1-score for identifying constructive and toxic comments separately. Besides, to have an objective assessment for the dataset, we implement a variety of baseline models as traditional Machine Learning and Deep Neural Network-Based models. With the results, we can solve some problems on the online discussions and develop the framework for identifying constructiveness and toxicity Vietnamese social media comments automatically.

dataset, deep learning, neural network, (15 more...)

arXiv.org Artificial Intelligence

2103.10069

Country:

Asia (0.95)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.51)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback