Views Are My Own, but Also Yours: Benchmarking Theory of Mind Using Common Ground

Soubki, Adil, Murzaku, John, Jordehi, Arash Yousefi, Zeng, Peter, Markowska, Magdalena, Mirroshandel, Seyed Abolghasem, Rambow, Owen

Jun-5-2024–arXiv.org Artificial Intelligence

Evaluating the theory of mind (ToM) capabilities of language models (LMs) has recently received a great deal of attention. However, many existing benchmarks rely on synthetic data, which risks misaligning the resulting experiments with human behavior. We introduce the first ToM dataset based on naturally occurring spoken dialogs, Common-ToM, and show that LMs struggle to demonstrate ToM. We then show that integrating a simple, explicit representation of beliefs improves LM performance on Common-ToM.

computational linguistic, corpus, experiment, (16 more...)

arXiv.org Artificial Intelligence

Jun-5-2024

arXiv.org PDF

Add feedback

Country:
- North America
  - Dominican Republic (0.04)
  - United States > New York
    - Suffolk County > Stony Brook (0.04)
    - New York County > New York City (0.04)
  - Canada > Ontario
    - Toronto (0.04)
- Europe
  - Belgium (0.04)
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
- Asia
  - Singapore (0.04)
  - China > Hong Kong (0.04)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.49)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found