ViLLM-Eval: A Comprehensive Evaluation Suite for Vietnamese Large Language Models
Nguyen, Trong-Hieu, Le, Anh-Cuong, Nguyen, Viet-Cuong
–arXiv.org Artificial Intelligence
Evaluation benchmarks play a pivotal role in the development of artificial intelligence (AI) systems. Traditionally, natural language processing (NLP) benchmarks have primarily focused on assessing specific and relatively straightforward abilities. However, the advent of large language models (LLMs), also known as foundation models, has brought about a paradigm shift. These powerful models have demonstrated a wide array of novel capabilities, prompting a redirection in the evaluation focus towards more general and intricate skills, such as comprehensive world knowledge and complex reasoning abilities. To align with the remarkable advancements in LLMs, new benchmarks have emerged to probe the diverse and multifaceted capabilities of these models. For instance, MMLU [8], HellaSwag [25], ARC [4], and TruthfulQA [10] are benchmark datasets that have garnered widespread recognition among researchers and are frequently employed on leaderboards to evaluate the performance of language models. However, these benchmarks are primarily tailored to the English language, resulting in a limited understanding of LLMs' capabilities in other languages, including Vietnamese. Despite the recent surge in powerful Vietnamese LLMs, such as Vistral-7B-Chat [12], PhoGPT-4B-Chat [13], and VinaLLaMA-7B-Chat [16], benchmarking these models on datasets translated from English to Vietnamese, even with perfect translations, cannot adequately assess the true quality of these language models concerning their knowledge about core interests of Vietnamese users.
arXiv.org Artificial Intelligence
Apr-18-2024
- Country:
- Africa (0.04)
- North America > United States
- Europe
- Russia (0.04)
- Portugal (0.04)
- United Kingdom > Scotland
- City of Edinburgh > Edinburgh (0.04)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Asia
- Southeast Asia (0.04)
- Russia (0.04)
- Middle East > Oman (0.04)
- India (0.04)
- China (0.04)
- Vietnam
- Nghệ An Province (0.04)
- Hanoi > Hanoi (0.04)
- Hồ Chí Minh City > Hồ Chí Minh City (0.04)
- Japan > Honshū
- Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Genre:
- Research Report (0.82)
- Overview (0.68)
- Industry:
- Education > Educational Setting (0.68)
- Technology: