Rolling the DICE on Idiomaticity: How LLMs Fail to Grasp Context
Mi, Maggie, Villavicencio, Aline, Moosavi, Nafise Sadat
–arXiv.org Artificial Intelligence
Human processing of idioms relies on understanding the contextual sentences in which idioms occur, as well as language-intrinsic features such as frequency and speaker-intrinsic factors like familiarity. While LLMs have shown high performance on idiomaticity detection tasks, this success may be attributed to reasoning shortcuts in existing datasets. To this end, we construct a novel, controlled contrastive dataset designed to test whether LLMs can effectively use context to disambiguate idiomatic meaning. Additionally, we explore how collocational frequency and sentence probability influence model performance. Our findings reveal that LLMs often fail to resolve idiomaticity when it is required to attend to the surrounding context, and that models perform better on sentences that have higher likelihood. The collocational frequency of expressions also impacts performance. We make our code and dataset publicly available.
arXiv.org Artificial Intelligence
Oct-21-2024
- Country:
- Asia
- British Indian Ocean Territory > Diego Garcia (0.04)
- China > Hong Kong (0.04)
- Japan > Kyūshū & Okinawa
- Kyūshū > Miyazaki Prefecture > Miyazaki (0.04)
- Middle East
- Jordan (0.04)
- Qatar > Ad-Dawhah
- Doha (0.04)
- Saudi Arabia > Asir Province
- Abha (0.04)
- UAE > Abu Dhabi Emirate
- Abu Dhabi (0.04)
- Thailand > Bangkok
- Bangkok (0.04)
- Europe
- Estonia > Tartu County
- Tartu (0.04)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Sweden > Vaestra Goetaland
- Gothenburg (0.04)
- Slovenia (0.04)
- United Kingdom > England
- Oxfordshire > Oxford (0.04)
- Monaco (0.04)
- Denmark > Capital Region
- Copenhagen (0.04)
- France > Provence-Alpes-Côte d'Azur
- Alpes-Maritimes > Nice (0.04)
- Bouches-du-Rhône > Marseille (0.04)
- Middle East > Malta
- Port Region > Southern Harbour District > Valletta (0.04)
- Spain > Valencian Community
- Valencia Province > Valencia (0.04)
- Germany > Berlin (0.04)
- Estonia > Tartu County
- North America
- Canada > Ontario
- Toronto (0.04)
- Dominican Republic (0.04)
- Mexico > Mexico City
- Mexico City (0.04)
- United States
- California (0.04)
- Georgia > Fulton County
- Atlanta (0.04)
- Canada > Ontario
- South America > Colombia
- Meta Department > Villavicencio (0.05)
- Asia
- Genre:
- Research Report > New Finding (0.48)
- Technology: