Efficient Training of Robust Traditional Chinese LLaMA-1B on a Single Consumer GPU: Continual Pre-training, SFT, and DPO
Chih, Yu-Cheng, Duan, Ming-Tao, Hou, Yong-Hao
–arXiv.org Artificial Intelligence
Small Language Models (SLMs) enable cost - effective, on - device and latency - sensitive AI applications, yet their deployment in Traditional Chinese (TC) remains hindered by token - level instability -- models unpredictably emit non - TC characters or code - switch into othe r languages. We address this practical reliability gap by creating PureTC - 1B, a three - stage stabilization pipeline for Llama - 3.2 - 1B - Instruct (an open - weight, instruction - tuned model released by Meta) [1] using parameter - efficient LoRA adapters [2] . Our met hod combines Continual Pre - Training (CPT) on TC - centric corpora, Supervised Fine - Tuning (SFT) with instruction data, and Direct Preference Optimization (DPO) [3] using TC - adherence preferences to improve monolingual robustness without full - model retraining. On a benchmark designed to simulate real - world usage, PureTC - 1B achieves a 51.3% relative reduction (micro - average) in non - TC output tokens versus the base model. On a Named Entity Translation (NET) task, PureTC - 1B further reduces incorrect - language tokens by 77.2% relative to Llama - 3B and 57.2% relative to Qwen - 1.5B, indicating that robust 2 of 17 TC adherence is attainable even at the 1B scale. The pipeline is reproducible, adapter - only, and hardware - friendly, offering practitioners a practical recipe to enhance language stability for TC and potentially other non - English languages.
arXiv.org Artificial Intelligence
Oct-3-2025