MiLiC-Eval: Benchmarking Multilingual LLMs for China's Minority Languages

Zhang, Chen, Tao, Mingxu, Liao, Zhiyuan, Feng, Yansong

Mar-2-2025–arXiv.org Artificial Intelligence

Large language models (LLMs) excel in high-resource languages but struggle with low-resource languages (LRLs), particularly those spoken by minority communities in China, such as Tibetan, Uyghur, Kazakh, and Mongolian. To systematically track the progress in these languages, we introduce MiLiC-Eval, a benchmark designed for minority languages in China, featuring 24K instances across 9 tasks. MiLiC-Eval focuses on underrepresented writing systems and provides a fine-grained assessment of linguistic and problem-solving skills. Our evaluation reveals that LLMs perform poorly on syntax-intensive tasks and multi-script languages. We further demonstrate how MiLiC-Eval can help advance LRL research in handling diverse writing systems and understanding the process of language adaptation.

computational linguistic, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

Mar-2-2025

arXiv.org PDF

Add feedback

Country:
- Asia
  - China (1.00)
  - Middle East > UAE
    - Abu Dhabi Emirate > Abu Dhabi (0.14)
- Europe > Middle East
  - Malta (0.14)
- North America > Mexico
  - Mexico City (0.14)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Education (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.49)
  - Natural Language > Large Language Model (1.00)