kahneman
- North America > United States > California > Alameda County > Berkeley (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > Japan > Honshū > Chūbu > Toyama Prefecture > Toyama (0.04)
- Information Technology > Artificial Intelligence > Cognitive Science > Simulation of Human Behavior (0.50)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.35)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.35)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)
The Bias is in the Details: An Assessment of Cognitive Bias in LLMs
Knipper, R. Alexander, Knipper, Charles S., Zhang, Kaiqi, Sims, Valerie, Bowers, Clint, Karmaker, Santu
As Large Language Models (LLMs) are increasingly embedded in real-world decision-making processes, it becomes crucial to examine the extent to which they exhibit cognitive biases. Extensively studied in the field of psychology, cognitive biases appear as systematic distortions commonly observed in human judgments. This paper presents a large-scale evaluation of eight well-established cognitive biases across 45 LLMs, analyzing over 2.8 million LLM responses generated through controlled prompt variations. To achieve this, we introduce a novel evaluation framework based on multiple-choice tasks, hand-curate a dataset of 220 decision scenarios targeting fundamental cognitive biases in collaboration with psychologists, and propose a scalable approach for generating diverse prompts from human-authored scenario templates. Our analysis shows that LLMs exhibit bias-consistent behavior in 17.8-57.3% of instances across a range of judgment and decision-making contexts targeting anchoring, availability, confirmation, framing, interpretation, overattribution, prospect theory, and representativeness biases. We find that both model size and prompt specificity play a significant role on bias susceptibility as follows: larger size (>32B parameters) can reduce bias in 39.5% of cases, while higher prompt detail reduces most biases by up to 14.9%, except in one case (Overattribution), which is exacerbated by up to 8.8%.
- Asia > Middle East > Jordan (0.04)
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
- Asia > Singapore (0.04)
- (16 more...)
- Health & Medicine > Therapeutic Area (0.68)
- Transportation > Air (0.46)
Cognitive Decision Routing in Large Language Models: When to Think Fast, When to Think Slow
Du, Y., Guo, C., Wang, W., Tang, G.
Large Language Models (LLMs) face a fundamental challenge in deciding when to rely on rapid, intuitive responses versus engaging in slower, more deliberate reasoning. Inspired by Daniel Kahneman's dual-process theory and his insights on human cognitive biases, we propose a novel Cognitive Decision Routing (CDR) framework that dynamically determines the appropriate reasoning strategy based on query characteristics. Our approach addresses the current limitations where models either apply uniform reasoning depth or rely on computationally expensive methods for all queries. We introduce a meta-cognitive layer that analyzes query complexity through multiple dimensions: correlation strength between given information and required conclusions, domain boundary crossings, stakeholder multiplicity, and uncertainty levels. Through extensive experiments on diverse reasoning tasks, we demonstrate that CDR achieves superior performance while reducing computational costs by 34\% compared to uniform deep reasoning approaches. Our framework shows particular strength in professional judgment tasks, achieving 23\% improvement in consistency and 18\% better accuracy on expert-level evaluations. This work bridges cognitive science principles with practical AI system design, offering a principled approach to adaptive reasoning in LLMs.
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.69)
EICAP: Deep Dive in Assessment and Enhancement of Large Language Models in Emotional Intelligence through Multi-Turn Conversations
Nazar, Nizi, Asgari, Ehsaneddin
Emotional Intelligence (EI) is a critical yet underexplored dimension in the development of human-aligned LLMs. To address this gap, we introduce a unified, psychologically grounded four-layer taxonomy of EI tailored for large language models (LLMs), encompassing emotional tracking, cause inference, appraisal, and emotionally appropriate response generation. Building on this framework, we present EICAP-Bench, a novel MCQ style multi-turn benchmark designed to evaluate EI capabilities in open-source LLMs across diverse linguistic and cultural contexts. We evaluate six LLMs: LLaMA3 (8B), LLaMA3-Instruct, Gemma (9B), Gemma-Instruct, Qwen2.5 (7B), and Qwen2.5-Instruct on EmoCap-Bench, identifying Qwen2.5-Instruct as the strongest baseline. To assess the potential for enhancing EI capabilities, we fine-tune both Qwen2.5-Base and Qwen2.5-Instruct using LoRA adapters on UltraChat (UC), a large-scale, instruction-tuned dialogue dataset, in both English and Arabic. Our statistical analysis reveals that among the five EI layers, only the Appraisal layer shows significant improvement through UC-based fine-tuning. These findings highlight the limitations of existing pretraining and instruction-tuning paradigms in equipping LLMs with deeper emotional reasoning and underscore the need for targeted data and modeling strategies for comprehensive EI alignment.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Qatar > Ad-Dawhah > Doha (0.04)
- Research Report > New Finding (0.93)
- Research Report > Experimental Study (0.93)
Learning, Reasoning, Refinement: A Framework for Kahneman's Dual-System Intelligence in GUI Agents
Wei, Jinjie, Liu, Jiyao, Liu, Lihao, Hu, Ming, Ning, Junzhi, Li, Mingcheng, Yin, Weijie, He, Junjun, Liang, Xiao, Feng, Chao, Yang, Dingkang
Graphical User Interface (GUI) agents have made significant progress in automating digital tasks through the utilization of computer vision and language models. Nevertheless, existing agent systems encounter notable limitations. Firstly, they predominantly depend on trial and error decision making rather than progressive reasoning, thereby lacking the capability to learn and adapt from interactive encounters. Secondly, these systems are assessed using overly simplistic single step accuracy metrics, which do not adequately reflect the intricate nature of real world GUI interactions. In this paper, we present CogniGUI, a cognitive framework developed to overcome these limitations by enabling adaptive learning for GUI automation resembling human-like behavior. Inspired by Kahneman's Dual Process Theory, our approach combines two main components: (1) an omni parser engine that conducts immediate hierarchical parsing of GUI elements through quick visual semantic analysis to identify actionable components, and (2) a Group based Relative Policy Optimization (GRPO) grounding agent that assesses multiple interaction paths using a unique relative reward system, promoting minimal and efficient operational routes. This dual-system design facilitates iterative ''exploration learning mastery'' cycles, enabling the agent to enhance its strategies over time based on accumulated experience. Moreover, to assess the generalization and adaptability of agent systems, we introduce ScreenSeek, a comprehensive benchmark that includes multi application navigation, dynamic state transitions, and cross interface coherence, which are often overlooked challenges in current benchmarks. Experimental results demonstrate that CogniGUI surpasses state-of-the-art methods in both the current GUI grounding benchmarks and our newly proposed benchmark.
- Asia > China > Shanghai > Shanghai (0.04)
- North America > United States > New York > New York County > New York City (0.04)
An Empirical Study of the Anchoring Effect in LLMs: Existence, Mechanism, and Potential Mitigations
Huang, Yiming, Bie, Biquan, Na, Zuqiu, Ruan, Weilin, Lei, Songxin, Yue, Yutao, He, Xinlei
The rise of Large Language Models (LLMs) like ChatGPT has advanced natural language processing, yet concerns about cognitive biases are growing. In this paper, we investigate the anchoring effect, a cognitive bias where the mind relies heavily on the first information as anchors to make affected judgments. We explore whether LLMs are affected by anchoring, the underlying mechanisms, and potential mitigation strategies. To facilitate studies at scale on the anchoring effect, we introduce a new dataset, SynAnchors. Combining refined evaluation metrics, we benchmark current widely used LLMs. Our findings show that LLMs' anchoring bias exists commonly with shallow-layer acting and is not eliminated by conventional strategies, while reasoning can offer some mitigation. This recontextualization via cognitive psychology urges that LLM evaluations focus not on standard benchmarks or over-optimized robustness tests, but on cognitive-bias-aware trustworthy evaluation.
- South America > Brazil (0.04)
- Europe > United Kingdom (0.04)
- Europe > Norway (0.04)
- (4 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.93)
Identifying Non-Replicable Social Science Studies with Language Models
Saynova, Denitsa, Hansson, Kajsa, Bruinsma, Bastiaan, Fredén, Annika, Johansson, Moa
In this study, we investigate whether LLMs can be used to indicate if a study in the behavioural social sciences is replicable. Using a dataset of 14 previously replicated studies (9 successful, 5 unsuccessful), we evaluate the ability of both open-source (Llama 3 8B, Qwen 2 7B, Mistral 7B) and proprietary (GPT-4o) instruction-tuned LLMs to discriminate between replicable and non-replicable findings. We use LLMs to generate synthetic samples of responses from behavioural studies and estimate whether the measured effects support the original findings. When compared with human replication results for these studies, we achieve F1 values of up to $77\%$ with Mistral 7B, $67\%$ with GPT-4o and Llama 3 8B, and $55\%$ with Qwen 2 7B, suggesting their potential for this task. We also analyse how effect size calculations are affected by sampling temperature and find that low variance (due to temperature) leads to biased effect estimates.
- Asia > Middle East > Jordan (0.14)
- Europe > Sweden (0.14)
A Peek into Token Bias: Large Language Models Are Not Yet Genuine Reasoners
Jiang, Bowen, Xie, Yangxinyu, Hao, Zhuoqun, Wang, Xiaomeng, Mallick, Tanwi, Su, Weijie J., Taylor, Camillo J., Roth, Dan
This study introduces a hypothesis-testing framework to assess whether large language models (LLMs) possess genuine reasoning abilities or primarily depend on token bias. We go beyond evaluating LLMs on accuracy; rather, we aim to investigate their token bias in solving logical reasoning tasks. Specifically, we develop carefully controlled synthetic datasets, featuring conjunction fallacy and syllogistic problems. Our framework outlines a list of hypotheses where token biases are readily identifiable, with all null hypotheses assuming genuine reasoning capabilities of LLMs. The findings in this study suggest, with statistical guarantee, that most LLMs still struggle with logical reasoning. While they may perform well on classic problems, their success largely depends on recognizing superficial patterns with strong token bias, thereby raising concerns about their actual reasoning and generalization abilities.
- Europe > United Kingdom > England > Greater London > London > Wimbledon (0.04)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- North America > United States > New York (0.04)
- (2 more...)
- Media > News (0.68)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.67)
- Government > Voting & Elections (0.67)
ABI Approach: Automatic Bias Identification in Decision-Making Under Risk based in an Ontology of Behavioral Economics
Ramos, Eduardo da C., Campos, Maria Luiza M., Baião, Fernanda
Organizational decision-making is crucial for success, yet cognitive biases can significantly affect risk preferences, leading to suboptimal outcomes. Risk seeking preferences for losses, driven by biases such as loss aversion, pose challenges and can result in severe negative consequences, including financial losses. This research introduces the ABI approach, a novel solution designed to support organizational decision-makers by automatically identifying and explaining risk seeking preferences during decision-making. This research makes a novel contribution by automating the identification and explanation of risk seeking preferences using Cumulative Prospect theory (CPT) from Behavioral Economics. The ABI approach transforms theoretical insights into actionable, real-time guidance, making them accessible to a broader range of organizations and decision-makers without requiring specialized personnel. By contextualizing CPT concepts into business language, the approach facilitates widespread adoption and enhances decision-making processes with deep behavioral insights. Our systematic literature review identified significant gaps in existing methods, especially the lack of automated solutions with a concrete mechanism for automatically identifying risk seeking preferences, and the absence of formal knowledge representation, such as ontologies, for identifying and explaining the risk preferences. The ABI Approach addresses these gaps, offering a significant contribution to decision-making research and practice. Furthermore, it enables automatic collection of historical decision data with risk preferences, providing valuable insights for enhancing strategic management and long-term organizational performance. An experiment provided preliminary evidence on its effectiveness in helping decision-makers recognize their risk seeking preferences during decision-making in the loss domain.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > New York (0.04)
- South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
- (8 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine (1.00)
- Banking & Finance (1.00)
- Government (0.92)
- Education (0.67)
On the Causal Nature of Sentiment Analysis
Lyu, Zhiheng, Jin, Zhijing, Gonzalez, Fernando, Mihalcea, Rada, Schoelkopf, Bernhard, Sachan, Mrinmaya
Sentiment analysis (SA) aims to identify the sentiment expressed in a text, such as a product review. Given a review and the sentiment associated with it, this paper formulates SA as a combination of two tasks: (1) a causal discovery task that distinguishes whether a review "primes" the sentiment (Causal Hypothesis C1), or the sentiment "primes" the review (Causal Hypothesis C2); and (2) the traditional prediction task to model the sentiment using the review as input. Using the peak-end rule in psychology, we classify a sample as C1 if its overall sentiment score approximates an average of all the sentence-level sentiments in the review, and C2 if the overall sentiment score approximates an average of the peak and end sentiments. For the prediction task, we use the discovered causal mechanisms behind the samples to improve the performance of LLMs by proposing causal prompts that give the models an inductive bias of the underlying causal graph, leading to substantial improvements by up to 32.13 F1 points on zero-shot five-class SA. Our code is at https://github.com/cogito233/causal-sa
- North America > United States > Washington > King County > Seattle (0.14)
- Europe > Switzerland > Zürich > Zürich (0.04)
- North America > United States > Michigan (0.04)
- (19 more...)
- Health & Medicine (0.67)
- Consumer Products & Services > Restaurants (0.46)