Not quite Sherlock Holmes: Language model predictions do not reliably differentiate impossible from improbable events
Michaelov, James A., Estacio, Reeka, Zhang, Zhien, Bergen, Benjamin K.
–arXiv.org Artificial Intelligence
Can language models reliably predict that possible events are more likely than merely improbable ones? By teasing apart possibility, typicality, and contextual relatedness, we show that despite the results of previous work, language models' ability to do this is far from robust. In fact, under certain conditions, all models tested - including Llama 3, Gemma 2, and Mistral NeMo - perform at worse-than-chance level, assigning higher probabilities to impossible sentences such as 'the car was given a parking ticket by the brake' than to merely unlikely sentences such as 'the car was given a parking ticket by the explorer'.
arXiv.org Artificial Intelligence
Jun-10-2025
- Country:
- Asia
- Europe
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Denmark > Capital Region
- Copenhagen (0.04)
- Italy > Tuscany
- Florence (0.04)
- Belgium > Brussels-Capital Region
- North America
- Dominican Republic (0.04)
- United States
- District of Columbia > Washington (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.28)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Washington > King County
- Seattle (0.04)
- Oceania > Australia
- Australian Capital Territory > Canberra (0.04)
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Health & Medicine > Therapeutic Area
- Neurology (0.46)
- Transportation (1.00)
- Health & Medicine > Therapeutic Area
- Technology: