Quantifying AI Psychology: A Psychometrics Benchmark for Large Language Models
Li, Yuan, Huang, Yue, Wang, Hongyi, Zhang, Xiangliang, Zou, James, Sun, Lichao
–arXiv.org Artificial Intelligence
Large Language Models (LLMs) have demonstrated exceptional task-solving capabilities, increasingly adopting roles akin to human-like assistants. The broader integration of LLMs into society has sparked interest in whether they manifest psychological attributes, and whether these attributes are stable--inquiries that could deepen the understanding of their behaviors. Inspired by psychometrics, this paper presents a framework for investigating psychology in LLMs, including psychological dimension identification, assessment dataset curation, and assessment with results validation. Following this framework, we introduce a comprehensive psychometrics benchmark for LLMs that covers six psychological dimensions: personality, values, emotion, theory of mind, motivation, and intelligence. This benchmark includes thirteen datasets featuring diverse scenarios and item types. Our findings indicate that LLMs manifest a broad spectrum of psychological attributes. We also uncover discrepancies between LLMs' self-reported traits and their behaviors in real-world scenarios. This paper demonstrates a thorough psychometric assessment of LLMs, providing insights into reliable evaluation and potential applications in AI and social sciences.
arXiv.org Artificial Intelligence
Jun-25-2024
- Country:
- Asia > Middle East
- UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.14)
- North America > United States
- Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > Middle East
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Education (1.00)
- Health & Medicine > Therapeutic Area
- Neurology (0.67)
- Psychiatry/Psychology (1.00)
- Information Technology (1.00)
- Technology: