LuxInstruct: A Cross-Lingual Instruction Tuning Dataset For Luxembourgish
Philippy, Fred, Bernardy, Laura, Guo, Siwen, Klein, Jacques, Bissyandé, Tegawendé F.
–arXiv.org Artificial Intelligence
Instruction tuning has become a key technique for enhancing the performance of large language models, enabling them to better follow human prompts. However, low-resource languages such as Luxembourgish face severe limitations due to the lack of high-quality instruction datasets. Traditional reliance on machine translation often introduces semantic misalignment and cultural inaccuracies. In this work, we address these challenges by creating a cross-lingual instruction tuning dataset for Luxembourgish, without resorting to machine-generated translations into it. Instead, by leveraging aligned data from English, French, and German, we build a high-quality dataset that preserves linguistic and cultural nuances. We provide evidence that cross-lingual instruction tuning not only improves representational alignment across languages but also the model's generative capabilities in Luxembourgish. This highlights how cross-lingual data curation can avoid the common pitfalls of machine-translated data and directly benefit low-resource language development.
arXiv.org Artificial Intelligence
Oct-9-2025
- Country:
- Asia
- Middle East
- Israel (0.04)
- Jordan (0.04)
- UAE > Abu Dhabi Emirate
- Abu Dhabi (0.14)
- Singapore (0.04)
- Thailand > Bangkok
- Bangkok (0.04)
- Middle East
- Europe
- Austria > Vienna (0.14)
- Estonia > Tartu County
- Tartu (0.04)
- Faroe Islands > Streymoy
- Tórshavn (0.04)
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- Germany (0.04)
- Greece (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Spain (0.04)
- North America
- Canada > Ontario
- Toronto (0.04)
- United States
- Florida > Miami-Dade County
- Miami (0.04)
- Maryland (0.04)
- North Dakota (0.04)
- Florida > Miami-Dade County
- Canada > Ontario
- Asia
- Genre:
- Research Report (0.82)
- Technology: