UniToMBench: Integrating Perspective-Taking to Improve Theory of Mind in LLMs
Thiyagarajan, Prameshwar, Parimi, Vaishnavi, Sai, Shamant, Garg, Soumil, Meirbek, Zhangir, Yarlagadda, Nitin, Zhu, Kevin, Kim, Chris
–arXiv.org Artificial Intelligence
Theory of Mind (ToM), the ability to understand the mental states of oneself and others, remains a challenging area for large language models (LLMs), which often fail to predict human mental states accurately. In this paper, we introduce UniToMBench, a unified benchmark that integrates the strengths of SimToM and TOMBENCH to systematically improve and assess ToM capabilities in LLMs by integrating multi-interaction task designs and evolving story scenarios. Supported by a custom dataset of over 1,000 hand-written scenarios, UniToMBench combines perspective-taking techniques with diverse evaluation metrics to better stimulate social cognition in LLMs. Through evaluation, we observe that while models like GPT-4o and GPT-4o Mini show consistently high accuracy in tasks involving emotional and belief-related scenarios, with results usually above 80%, there is significant variability in their performance across knowledge-based tasks. These results highlight both the strengths and limitations of current LLMs in ToM-related tasks, underscoring the value of UniToMBench as a comprehensive tool for future development. Our code is publicly available here: https://github.com/Shamant/unifiedtombenchmark.
arXiv.org Artificial Intelligence
Jun-12-2025
- Country:
- Africa > Mali (0.04)
- Asia > Middle East
- Jordan (0.04)
- Europe > Monaco (0.04)
- North America > United States
- California > Santa Clara County > Palo Alto (0.04)
- Genre:
- Research Report (0.40)
- Technology: