ViLLM-Eval: A Comprehensive Evaluation Suite for Vietnamese Large Language Models

Nguyen, Trong-Hieu, Le, Anh-Cuong, Nguyen, Viet-Cuong

Apr-18-2024–arXiv.org Artificial Intelligence

Evaluation benchmarks play a pivotal role in the development of artificial intelligence (AI) systems. Traditionally, natural language processing (NLP) benchmarks have primarily focused on assessing specific and relatively straightforward abilities. However, the advent of large language models (LLMs), also known as foundation models, has brought about a paradigm shift. These powerful models have demonstrated a wide array of novel capabilities, prompting a redirection in the evaluation focus towards more general and intricate skills, such as comprehensive world knowledge and complex reasoning abilities. To align with the remarkable advancements in LLMs, new benchmarks have emerged to probe the diverse and multifaceted capabilities of these models. For instance, MMLU [8], HellaSwag [25], ARC [4], and TruthfulQA [10] are benchmark datasets that have garnered widespread recognition among researchers and are frequently employed on leaderboards to evaluate the performance of language models. However, these benchmarks are primarily tailored to the English language, resulting in a limited understanding of LLMs' capabilities in other languages, including Vietnamese. Despite the recent surge in powerful Vietnamese LLMs, such as Vistral-7B-Chat [12], PhoGPT-4B-Chat [13], and VinaLLaMA-7B-Chat [16], benchmarking these models on datasets translated from English to Vietnamese, even with perfect translations, cannot adequately assess the true quality of these language models concerning their knowledge about core interests of Vietnamese users.

dataset, language model, villm-eval, (14 more...)

arXiv.org Artificial Intelligence

Apr-18-2024

arXiv.org PDF

Add feedback

Country:
- Africa (0.04)
- North America > United States
  - Maryland > Baltimore (0.04)
- Europe
  - Russia (0.04)
  - Portugal (0.04)
  - United Kingdom > Scotland
    - City of Edinburgh > Edinburgh (0.04)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)
- Asia
  - Southeast Asia (0.04)
  - Russia (0.04)
  - Middle East > Oman (0.04)
  - India (0.04)
  - China (0.04)
  - Vietnam
    - Nghệ An Province (0.04)
    - Hanoi > Hanoi (0.04)
    - Hồ Chí Minh City > Hồ Chí Minh City (0.04)
  - Japan > Honshū
    - Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)

Genre:
- Research Report (0.82)
- Overview (0.68)

Industry:
- Education > Educational Setting (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found