Not quite Sherlock Holmes: Language model predictions do not reliably differentiate impossible from improbable events

Michaelov, James A., Estacio, Reeka, Zhang, Zhien, Bergen, Benjamin K.

Jun-10-2025–arXiv.org Artificial Intelligence

Can language models reliably predict that possible events are more likely than merely improbable ones? By teasing apart possibility, typicality, and contextual relatedness, we show that despite the results of previous work, language models' ability to do this is far from robust. In fact, under certain conditions, all models tested - including Llama 3, Gemma 2, and Mistral NeMo - perform at worse-than-chance level, assigning higher probabilities to impossible sentences such as 'the car was given a parking ticket by the brake' than to merely unlikely sentences such as 'the car was given a parking ticket by the explorer'.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Jun-10-2025

arXiv.org PDF

Add feedback

Country:
- Asia (1.00)
- Europe (0.93)
- North America > United States
  - Minnesota (0.28)
  - Massachusetts > Middlesex County
    - Cambridge (0.28)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Transportation (1.00)
- Health & Medicine > Therapeutic Area
  - Neurology (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (0.88)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found