GPT's Judgements Under Uncertainty

Saeedi, Payam, Goodarzi, Mahsa

arXiv.org Artificial Intelligence 

--We investigate the presence of cognitive biases in three large language models (LLMs): GPT -4o, Gemma 2, and Llama 3.1. The study uses 1,500 experiments across nine established cognitive biases to evaluate the responses and consistency of the models. GPT -4o demonstrated the strongest overall performance. Gemma 2 showed strengths in addressing the sunk cost fallacy and prospect theory; however, its performance varied across different biases. Llama 3.1 consistently underperformed, relying on heuristics and exhibiting frequent inconsistencies and contradictions. The findings highlight the challenges of achieving robust and generalizable reasoning in LLMs, and underscore the need for further development to mitigate biases in artificial general intelligence (AGI). The study emphasizes the importance of integrating statistical reasoning and ethical considerations in future AI development. Cognitive biases and heuristics are well-established phenomena of the human mind, shaping how individuals process information, make judgments, and make decisions. These biases emerge from heuristics -- mental shortcuts that simplify complex tasks by substituting them with cognitively easier alternatives [1]. While heuristics enable quick and efficient reasoning, they also introduce systematic errors that impact judgment and decision-making [2]-[4]. Understanding whether such biases, embedded in the data and interactions that shape Large Language Models (LLMs), are reflected in their outputs is not only critical for evaluating their alignment with human cognition but also vital for the development of Artificial General Intelligence (AGI). AGI, envisioned as systems capable of performing any intellectual task a human can, must navigate the intricacies of human-like reasoning while avoiding harmful or irresponsible biases.