Beyond Self-Reports: Multi-Observer Agents for Personality Assessment in Large Language Models

May-21-2025–arXiv.org Artificial Intelligence

Self-report questionnaires have long been used to assess LLM personality traits, yet they fail to capture behavioral nuances due to biases and meta-knowledge contamination. This paper proposes a novel multi-observer framework for personality trait assessments in LLM agents that draws on informant-report methods in psychology. Instead of relying on self-assessments, we employ multiple observer agents. Each observer is configured with a specific relational context (e.g., family member, friend, or coworker) and engages the subject LLM in dialogue before evaluating its behavior across the Big Five dimensions. We show that these observer-report ratings align more closely with human judgments than traditional self-reports and reveal systematic biases in LLM self-assessments. We also found that aggregating responses from 5 to 7 observers reduces systematic biases and achieves optimal reliability. Our results highlight the role of relationship context in perceiving personality and demonstrate that a multi-observer paradigm offers a more reliable, context-sensitive approach to evaluating LLM personality traits.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

May-21-2025

arXiv.org PDF

Add feedback

Country:
- Europe > United Kingdom (0.04)
- Oceania
  - New Zealand (0.04)
  - Australia (0.04)
- North America
  - Canada (0.04)
  - United States > Florida
    - Miami-Dade County > Miami (0.04)
- Asia > Japan
  - Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Health & Medicine (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.69)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found