ViLLM-Eval: A Comprehensive Evaluation Suite for Vietnamese Large Language Models
Nguyen, Trong-Hieu, Le, Anh-Cuong, Nguyen, Viet-Cuong
–arXiv.org Artificial Intelligence
Evaluation benchmarks play a pivotal role in the development of artificial intelligence (AI) systems. Traditionally, natural language processing (NLP) benchmarks have primarily focused on assessing specific and relatively straightforward abilities. However, the advent of large language models (LLMs), also known as foundation models, has brought about a paradigm shift. These powerful models have demonstrated a wide array of novel capabilities, prompting a redirection in the evaluation focus towards more general and intricate skills, such as comprehensive world knowledge and complex reasoning abilities. To align with the remarkable advancements in LLMs, new benchmarks have emerged to probe the diverse and multifaceted capabilities of these models. For instance, MMLU [8], HellaSwag [25], ARC [4], and TruthfulQA [10] are benchmark datasets that have garnered widespread recognition among researchers and are frequently employed on leaderboards to evaluate the performance of language models. However, these benchmarks are primarily tailored to the English language, resulting in a limited understanding of LLMs' capabilities in other languages, including Vietnamese. Despite the recent surge in powerful Vietnamese LLMs, such as Vistral-7B-Chat [12], PhoGPT-4B-Chat [13], and VinaLLaMA-7B-Chat [16], benchmarking these models on datasets translated from English to Vietnamese, even with perfect translations, cannot adequately assess the true quality of these language models concerning their knowledge about core interests of Vietnamese users.
arXiv.org Artificial Intelligence
Apr-18-2024
- Country:
- Africa (0.04)
- Asia
- China (0.04)
- India (0.04)
- Japan > Honshū
- Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Middle East > Oman (0.04)
- Russia (0.04)
- Southeast Asia (0.04)
- Vietnam
- Hanoi > Hanoi (0.04)
- Hồ Chí Minh City > Hồ Chí Minh City (0.04)
- Nghệ An Province (0.04)
- Europe
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Portugal (0.04)
- Russia (0.04)
- United Kingdom > Scotland
- City of Edinburgh > Edinburgh (0.04)
- Belgium > Brussels-Capital Region
- North America > United States
- Genre:
- Overview (0.68)
- Research Report (0.82)
- Industry:
- Education > Educational Setting (0.68)
- Technology: