WildIFEval: Instruction Following in the Wild

Lior, Gili, Yehudai, Asaf, Gera, Ariel, Ein-Dor, Liat

Mar-9-2025–arXiv.org Artificial Intelligence

Recent LLMs have shown remarkable success in following user instructions, yet handling instructions with multiple constraints remains a significant challenge. In this work, we introduce WildIFEval - a large-scale dataset of 12K real user instructions with diverse, multi-constraint conditions. Unlike prior datasets, our collection spans a broad lexical and topical spectrum of constraints, in natural user prompts. We categorize these constraints into eight high-level classes to capture their distribution and dynamics in real-world scenarios. Leveraging WildIFEval, we conduct extensive experiments to benchmark the instruction-following capabilities of leading LLMs. Our findings reveal that all evaluated models experience performance degradation with an increasing number of constraints. Thus, we show that all models have a large room for improvement on such tasks. Moreover, we observe that the specific type of constraint plays a critical role in model performance. We release our dataset to promote further research on instruction-following under complex, realistic conditions.

constraint, ild ife val, instruction, (12 more...)

arXiv.org Artificial Intelligence

Mar-9-2025

arXiv.org PDF

Add feedback

Country:
- South America > Argentina
  - Pampas > Buenos Aires F.D. > Buenos Aires (0.04)
- North America
  - United States
    - New York > New York County
      - New York City (0.04)
    - Hawaii > Honolulu County
      - Honolulu (0.04)
    - Florida > Miami-Dade County
      - Miami (0.04)
  - Canada > Ontario
    - Toronto (0.04)
- Asia
  - Singapore (0.04)
  - China (0.04)
  - Thailand > Bangkok
    - Bangkok (0.04)
  - Taiwan > Taiwan Province
    - Taipei (0.04)
  - Middle East
    - UAE > Abu Dhabi Emirate
      - Abu Dhabi (0.04)
    - Israel > Jerusalem District
      - Jerusalem (0.04)
  - Japan > Honshū
    - Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)

Genre:
- Research Report > New Finding (0.88)

Industry:
- Leisure & Entertainment (0.93)
- Education > Educational Setting (0.47)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Natural Language > Large Language Model (1.00)
  - Machine Learning (0.94)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found