Enriching Biomedical Knowledge for Low-resource Language Through Large-Scale Translation

Phan, Long, Dang, Tai, Tran, Hieu, Trinh, Trieu H., Phan, Vy, Chau, Lam D., Luong, Minh-Thang

Jan-29-2023–arXiv.org Artificial Intelligence

Biomedical data and benchmarks are highly valuable yet very limited in low-resource languages other than English such as Vietnamese. In this paper, we make use of a state-of-the-art translation model in English-Vietnamese to translate and produce both pretrained as well as supervised data in the biomedical domains. Thanks to such large-scale translation, we introduce ViPubmedT5, a pretrained Encoder-Decoder Transformer model trained on 20 million translated abstracts from the high-quality public PubMed corpus. ViPubMedT5 demonstrates state-of-the-art results on two different biomedical benchmarks in summarization and acronym disambiguation. Further, we release ViMedNLI - a new NLP task in Vietnamese translated from MedNLI using the recently public En-vi translation model and carefully refined by human experts, with evaluations of existing methods against ViPubmedT5.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Jan-29-2023

arXiv.org PDF

Add feedback

Country:
- Asia (0.93)
- Europe (1.00)
- North America > United States (0.93)

Genre:
- Research Report (1.00)

Industry:
- Health & Medicine
  - Diagnostic Medicine (0.68)
  - Therapeutic Area
    - Immunology (0.68)
    - Infections and Infectious Diseases (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Natural Language > Machine Translation (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found