Bayesian Social Deduction with Graph-Informed Language Models

Rahimirad, Shahab, Gergerli, Guven, Romero, Lucia, Qian, Angela, Olson, Matthew Lyle, Stepputtis, Simon, Campbell, Joseph

Jun-24-2025–arXiv.org Artificial Intelligence

Social reasoning - inferring unobservable beliefs and intentions from partial observations of other agents - remains a challenging task for large language models (LLMs). We evaluate the limits of current reasoning language models in the social deduction game Avalon and find that while the largest models demonstrate strong performance, they require extensive test-time inference and degrade sharply when distilled to smaller, real-time-capable variants. To address this, we introduce a hybrid reasoning framework that externalizes belief inference to a structured probabilistic model, while using an LLM for language understanding and interaction. Our approach achieves competitive performance with much larger models in Agent-Agent play and, notably, is the first language agent to defeat human players in a controlled study - achieving a 67% win rate and receiving higher qualitative ratings than both reasoning baselines and human teammates. We release code, models, and a dataset to support future work on social reasoning in LLM agents, which can be found at https://camp-lab-purdue.github.io/bayesian-social-deduction/

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Jun-24-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (1.00)
- Asia (0.68)

Genre:
- Research Report
  - New Finding (1.00)
  - Experimental Study (1.00)

Industry:
- Leisure & Entertainment > Games (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Uncertainty (1.00)
  - Natural Language > Large Language Model (1.00)
  - Machine Learning
    - Neural Networks > Deep Learning (0.95)
    - Learning Graphical Models > Directed Networks
      - Bayesian Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found