VNJPTranslate: A comprehensive pipeline for Vietnamese-Japanese translation
Phan, Hoang Hai, Vu, Nguyen Duc Minh, Phuong, Nam Dang
–arXiv.org Artificial Intelligence
Neural Machine Translation (NMT) driven by Transformer architectures has advanced significantly, yet faces challenges with low-resource language pairs like Vietnamese-Japanese (Vi-Ja). Issues include sparse parallel data and handling linguistic/cultural nuances. Recent progress in Large Language Models (LLMs) with strong reasoning, often refined via Reinforcement Learning (RL), enables high-quality synthetic data generation. We introduce VNJPTranslate, a pipeline designed to systematically address the Vi-Ja translation task. It features a targeted data augmentation strategy using advanced LLMs with Chain-of-Thought prompting for challenging segments identified via corpus analysis. Subsequently, we employ efficient fine-tuning techniques (Unsloth with QLoRA) on a capable, low-parameter autoregressive model (specifically, a fine-tuned version of the 1.8B parameter Sailor model, which is based on the Qwen architecture) to create a practical and high-performing translation system. This integrated approach aims to improve Vi-Ja translation quality significantly over existing baselines.
arXiv.org Artificial Intelligence
Mar-31-2025
- Country:
- North America > United States
- Florida > Miami-Dade County > Miami (0.04)
- Europe > Finland
- Asia
- East Asia (0.04)
- China > Hong Kong (0.04)
- Vietnam > Thái Nguyên Province
- Thái Nguyên (0.05)
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.04)
- North America > United States
- Genre:
- Overview (0.48)
- Research Report (0.42)
- Technology: