WildIFEval: Instruction Following in the Wild
Lior, Gili, Yehudai, Asaf, Gera, Ariel, Ein-Dor, Liat
–arXiv.org Artificial Intelligence
Recent LLMs have shown remarkable success in following user instructions, yet handling instructions with multiple constraints remains a significant challenge. In this work, we introduce WildIFEval - a large-scale dataset of 12K real user instructions with diverse, multi-constraint conditions. Unlike prior datasets, our collection spans a broad lexical and topical spectrum of constraints, in natural user prompts. We categorize these constraints into eight high-level classes to capture their distribution and dynamics in real-world scenarios. Leveraging WildIFEval, we conduct extensive experiments to benchmark the instruction-following capabilities of leading LLMs. Our findings reveal that all evaluated models experience performance degradation with an increasing number of constraints. Thus, we show that all models have a large room for improvement on such tasks. Moreover, we observe that the specific type of constraint plays a critical role in model performance. We release our dataset to promote further research on instruction-following under complex, realistic conditions.
arXiv.org Artificial Intelligence
Mar-9-2025
- Country:
- Asia
- China (0.04)
- Japan > Honshū
- Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Middle East
- Israel > Jerusalem District
- Jerusalem (0.04)
- UAE > Abu Dhabi Emirate
- Abu Dhabi (0.04)
- Israel > Jerusalem District
- Singapore (0.04)
- Taiwan > Taiwan Province
- Taipei (0.04)
- Thailand > Bangkok
- Bangkok (0.04)
- North America
- Canada > Ontario
- Toronto (0.04)
- United States
- Florida > Miami-Dade County
- Miami (0.04)
- Hawaii > Honolulu County
- Honolulu (0.04)
- New York > New York County
- New York City (0.04)
- Florida > Miami-Dade County
- Canada > Ontario
- South America > Argentina
- Pampas > Buenos Aires F.D. > Buenos Aires (0.04)
- Asia
- Genre:
- Research Report > New Finding (0.88)
- Industry:
- Education > Educational Setting (0.47)
- Leisure & Entertainment (0.93)
- Technology: