Dual Traits in Probabilistic Reasoning of Large Language Models

Li, Shenxiong, Rui, Huaxia

arXiv.org Artificial Intelligence 

We conducted three experiments to investigate how large language models (LLMs) evaluate posterior probabilities. Our results reveal the coexistence of two modes in posterior judgment among state-of-the-art models: a normative mode, which adheres to Bayes' rule, and a representative-based mode, which relies on similarity -- paralleling human System 1 and System 2 thinking. Additionally, we observed that LLMs struggle to recall base rate information from their memory, and developing prompt engineering strategies to mitigate representative-based judgment may be challenging. We further conjecture that the dual modes of judgment may be a result of the contrastive loss function employed in reinforcement learning from human feedback. Our findings underscore the potential direction for reducing cognitive biases in LLMs and the necessity for cautious deployment of LLMs in critical areas. The remarkable advancements in large language models (LLMs) have ushered in a new era where these models rival human expertise across domains like academia, law, medicine, and finance [4, 12, 13, 22-24]. In this study, we explore how LLMs judge this posterior probability. A higher similarity corresponds to a higher assessed posterior probability. This study comprises three experiments with progressively stricter conditions, reducing the information available for posterior likelihood assessment. The structured test provides all information needed for normative judgment, the semi-structured test omits the diagnosticity of evidence, and the unstructured test requires LLMs to recall all components of Bayes' rule. Results reveal that LLMs' judgments shift from f Representativeness can be constructed through typicality or prototypicality. Typicality describes the common or average case of the class, whereas prototypicality embodies the most idealized and iconic version of the class. For instance, a typical example of a physicist is a smart man who likes math and physics, while a prototypical example of a physicist is Stephen Hawking. This study moves beyond bias detection to investigate the basis upon which LLMs assess probabilities. This has important practical implications for the integration of LLMs into various critical fields.