M-IFEval: Multilingual Instruction-Following Evaluation

Dussolle, Antoine, Díaz, Andrea Cardeña, Sato, Shota, Devine, Peter

Feb-7-2025–arXiv.org Artificial Intelligence

Instruction following is a core capability of modern Large language models (LLMs), making evaluating this capability essential to understanding these models. The Instruction Following Evaluation (IFEval) benchmark from the literature does this using objective criteria, offering a measure of LLM performance without subjective AI or human judgement. However, it only includes English instructions, limiting its ability to assess LLMs in other languages. We propose the Multilingual Instruction Following Evaluation (M-IFEval) benchmark, expanding the evaluation to French, Japanese, and Spanish, with both general and language-specific instructions. Applying this benchmark to 8 state-of-the-art LLMs, we find that benchmark performance across languages and instruction types can vary widely, underscoring the importance of a multilingual benchmark for evaluating LLMs in a diverse cultural context.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

Feb-7-2025

arXiv.org PDF

Add feedback

Country:
- Africa > Nigeria (0.04)
- North America > United States
  - New York > New York County > New York City (0.04)
- Europe
  - Faroe Islands > Streymoy
    - Tórshavn (0.04)
  - Estonia > Tartu County
    - Tartu (0.04)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)
- Asia
  - Indonesia > Bali (0.04)
  - Thailand > Bangkok
    - Bangkok (0.05)
  - Myanmar > Tanintharyi Region
    - Dawei (0.04)

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.30)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found