Measuring the Robustness of Natural Language Processing Models to Domain Shifts
Calderon, Nitay, Porat, Naveh, Ben-David, Eyal, Gekhman, Zorik, Oved, Nadav, Reichart, Roi
–arXiv.org Artificial Intelligence
Existing research on Domain Robustness (DR) suffers from disparate setups, lack of evaluation task variety, and reliance on challenge sets. In this paper, we pose a fundamental question: What is the state of affairs of the DR challenge in the era of Large Language Models (LLMs)? To this end, we construct a DR benchmark comprising diverse NLP tasks, including sentence and token-level classification, QA, and generation, each task consists of several domains. We explore the DR challenge of fine-tuned and few-shot learning models in natural domain shift settings and devise two diagnostic metrics of Out-of-Distribution (OOD) performance degradation: The commonly used Source Drop (SD) and the overlooked Target Drop (TD). Our findings reveal important insights: First, despite their capabilities, zero-to-few shot LLMs and fine-tuning approaches still fail to meet satisfactory performance in the OOD context; Second, TD approximates better than SD the average OOD degradation; Third, in a significant proportion of domain shifts, either SD or TD is positive, but not both, and therefore disregarding one can lead to incorrect DR conclusions.
arXiv.org Artificial Intelligence
Jul-1-2023
- Country:
- Africa > Ethiopia
- Addis Ababa > Addis Ababa (0.04)
- Asia
- China > Hong Kong (0.04)
- Middle East
- Israel (0.04)
- Jordan (0.04)
- UAE > Abu Dhabi Emirate
- Abu Dhabi (0.04)
- Europe
- Czechia > Prague (0.04)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Ukraine > Kyiv Oblast
- Kyiv (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Sweden > Uppsala County
- Uppsala (0.04)
- Denmark > Capital Region
- Copenhagen (0.04)
- Italy > Tuscany
- Florence (0.04)
- Austria (0.04)
- North America
- Canada > British Columbia
- Dominican Republic (0.04)
- United States
- California > San Diego County
- San Diego (0.04)
- Colorado > Denver County
- Denver (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Maryland (0.04)
- New York > New York County
- New York City (0.04)
- Oregon > Multnomah County
- Portland (0.04)
- Texas > Travis County
- Austin (0.04)
- Washington > King County
- Seattle (0.04)
- California > San Diego County
- Oceania > Australia
- New South Wales > Sydney (0.04)
- Africa > Ethiopia
- Genre:
- Research Report > New Finding (1.00)
- Technology: