A Comprehensive Evaluation of Cognitive Biases in LLMs

Malberg, Simon, Poletukhin, Roman, Schuster, Carolin M., Groh, Georg

Oct-20-2024–arXiv.org Artificial Intelligence

We present a large-scale evaluation of 30 cognitive biases in 20 state-of-the-art large language models (LLMs) under various decision-making scenarios. Our contributions include a novel general-purpose test framework for reliable and large-scale generation of tests for LLMs, a benchmark dataset with 30,000 tests for detecting cognitive biases in LLMs, and a comprehensive assessment of the biases found in the 20 evaluated LLMs. Our work confirms and broadens previous findings suggesting the presence of cognitive Figure 1: An LLM changes its answer as the framing of biases in LLMs by reporting evidence of all the decision changes, indicating the susceptibility of the 30 tested biases in at least some of the 20 LLM to the Framing Effect.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Oct-20-2024

arXiv.org PDF

Add feedback

Country:
- Oceania > New Zealand (0.04)
- North America
  - United States
    - New York > New York County
      - New York City (0.04)
    - Illinois > Cook County
      - Chicago (0.04)
  - Canada > Ontario
    - Toronto (0.04)
- Europe
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
  - Germany > Bavaria
    - Upper Bavaria > Munich (0.04)
  - Croatia > Dubrovnik-Neretva County
    - Dubrovnik (0.04)
- Asia
  - Singapore (0.04)
  - Thailand > Bangkok
    - Bangkok (0.04)
  - Middle East > UAE
    - Abu Dhabi Emirate > Abu Dhabi (0.04)

Genre:
- Research Report > New Finding (0.87)

Industry:
- Health & Medicine > Therapeutic Area (1.00)
- Banking & Finance (0.92)
- Education (0.67)
- Telecommunications (0.67)
- Government (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found