Privileged Self-Access Matters for Introspection in AI

Song, Siyuan, Lederman, Harvey, Hu, Jennifer, Mahowald, Kyle

Aug-21-2025–arXiv.org Artificial Intelligence

Whether AI models can introspect is an increasingly important practical question. But there is no consensus on how introspection is to be defined. Beginning from a recently proposed ''lightweight'' definition, we argue instead for a thicker one. According to our proposal, introspection in AI is any process which yields information about internal states through a process more reliable than one with equal or lower computational cost available to a third party. Using experiments where LLMs reason about their internal temperature parameters, we show they can appear to have lightweight introspection while failing to meaningfully introspect per our proposed definition.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Aug-21-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States > Texas (0.14)

Genre:
- Research Report > New Finding (0.47)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (0.97)
  - Machine Learning > Neural Networks
    - Deep Learning (0.31)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found