LinguaLIFT: An Effective Two-stage Instruction Tuning Framework for Low-Resource Language Tasks
Zhang, Hongbin, Chen, Kehai, Bai, Xuefeng, Xiang, Yang, Zhang, Min
–arXiv.org Artificial Intelligence
Large language models (LLMs) have demonstrated impressive multilingual understanding and reasoning capabilities, driven by extensive pre-training multilingual corpora and fine-tuning instruction data. However, a performance gap persists between high-resource and low-resource language tasks due to language imbalance in the pre-training corpus, even using more low-resource data during fine-tuning. To alleviate this issue, we propose LinguaLIFT, a two-stage instruction tuning framework for advancing low-resource language tasks. An additional language alignment layer is first integrated into the LLM to adapt a pre-trained multilingual encoder, thereby enhancing multilingual alignment through code-switched fine-tuning. The second stage fine-tunes LLM with English-only instruction data while freezing the language alignment layer, allowing LLM to transfer task-specific capabilities from English to low-resource language tasks. Additionally, we introduce the Multilingual Math World Problem (MMWP) benchmark, which spans 21 low-resource, 17 medium-resource, and 10 high-resource languages, enabling comprehensive evaluation of multilingual reasoning. Experimental results show that LinguaLIFT outperforms several competitive baselines across MMWP and other widely used benchmarks.
arXiv.org Artificial Intelligence
Dec-16-2024
- Country:
- Africa > South Sudan
- Asia
- China
- Guangdong Province > Shenzhen (0.04)
- Heilongjiang Province > Harbin (0.04)
- Japan > Honshū
- Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.04)
- Singapore (0.04)
- Thailand > Bangkok
- Bangkok (0.04)
- China
- Europe
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Italy > Tuscany
- Florence (0.04)
- Middle East > Malta (0.04)
- Portugal > Lisbon
- Lisbon (0.04)
- Belgium > Brussels-Capital Region
- North America
- Canada
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.04)
- Ontario > Toronto (0.04)
- British Columbia > Metro Vancouver Regional District
- Mexico > Mexico City
- Mexico City (0.04)
- United States
- California > San Diego County
- San Diego (0.04)
- Florida > Miami-Dade County
- Miami (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Pennsylvania > Philadelphia County
- Philadelphia (0.04)
- California > San Diego County
- Canada
- Genre:
- Research Report > New Finding (1.00)
- Technology: