AITopics | Nguyen, Linh The

Collaborating Authors

Nguyen, Linh The

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

PhoWhisper: Automatic Speech Recognition for Vietnamese

Le, Thanh-Thien, Nguyen, Linh The, Nguyen, Dat Quoc

arXiv.org Artificial IntelligenceMar-27-2024

We introduce PhoWhisper in five versions for Vietnamese automatic speech recognition. PhoWhisper's robustness is achieved through fine-tuning the Whisper model on an 844-hour dataset that encompasses diverse Vietnamese accents. Our experimental study demonstrates state-of-the-art performances of PhoWhisper on benchmark Vietnamese ASR datasets. Automatic speech recognition (ASR) technology, also referred to as speech-to-text, has experienced significant advancements (Baevski et al., 2020; Barrault et al., 2023; Pratap et al., 2023), expanding its applicability across a wide range of applications. The state-of-the-art ASR model, Whisper (Radford et al., 2023), has become extremely popular, being widely used in both academia and industry.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2406.02555

Country: Asia > Vietnam (0.16)

Genre: Research Report > New Finding (0.35)

Technology: Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)

Add feedback

PhoGPT: Generative Pre-training for Vietnamese

Nguyen, Dat Quoc, Nguyen, Linh The, Tran, Chi, Nguyen, Dung Ngoc, Phung, Dinh, Bui, Hung

arXiv.org Artificial IntelligenceFeb-4-2024

We open-source a state-of-the-art 4B-parameter generative model series for Vietnamese, which includes the base pre-trained monolingual model PhoGPT-4B and its chat variant, PhoGPT-4B-Chat. The base model, PhoGPT-4B, with exactly 3.7B parameters, is pre-trained from scratch on a Vietnamese corpus of 102B tokens, with an 8192 context length, employing a vocabulary of 20480 token types. The chat variant, PhoGPT-4B-Chat, is the modeling output obtained by fine-tuning PhoGPT-4B on a dataset of 70K instructional prompts and their responses, along with an additional 290K conversations. We demonstrate its strong performance compared to previous closed-source and open-source 7B-parameter models. Our PhoGPT models are available at: https://github.com/VinAIResearch/PhoGPT

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2311.02945

Country: Asia > Vietnam (0.29)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.32)

Add feedback

XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech

Nguyen, Linh The, Pham, Thinh, Nguyen, Dat Quoc

arXiv.org Artificial IntelligenceMay-31-2023

We present XPhoneBERT, the first multilingual model pre-trained to learn phoneme representations for the downstream text-to-speech (TTS) task. Our XPhoneBERT has the same model architecture as BERT-base, trained using the RoBERTa pre-training approach on 330M phoneme-level sentences from nearly 100 languages and locales. Experimental results show that employing XPhoneBERT as an input phoneme encoder significantly boosts the performance of a strong neural TTS model in terms of naturalness and prosody and also helps produce fairly high-quality speech with limited training data. We publicly release our pre-trained XPhoneBERT with the hope that it would facilitate future research and downstream TTS applications for multiple languages. Our XPhoneBERT model is available at https://github.com/VinAIResearch/XPhoneBERT

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2305.19709

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.74)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.63)

Add feedback