Revisiting Out-of-distribution Robustness in NLP: Benchmarks, Analysis, and LLMs Evaluations

Dec-26-2025, 14:51:54 GMT–Neural Information Processing Systems

We find that the distribution shift settings in previous studies commonly lack adequate challenges, hindering the accurate evaluation of OOD robustness. To address these issues, we propose a benchmark construction protocol that ensures clear differentiation and challenging distribution shifts.

llm evaluation, name change, revisiting out-of-distribution robustness, (6 more...)

Neural Information Processing Systems

Dec-26-2025, 14:51:54 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence (0.46)