Blora Regency
NegotiationToM: A Benchmark for Stress-testing Machine Theory of Mind on Negotiation Surrounding
Chan, Chunkit, Jiayang, Cheng, Yim, Yauwai, Deng, Zheye, Fan, Wei, Li, Haoran, Liu, Xin, Zhang, Hongming, Wang, Weiqi, Song, Yangqiu
Large Language Models (LLMs) have sparked substantial interest and debate concerning their potential emergence of Theory of Mind (ToM) ability. Theory of mind evaluations currently focuses on testing models using machine-generated data or game settings prone to shortcuts and spurious correlations, which lacks evaluation of machine ToM ability in real-world human interaction scenarios. This poses a pressing demand to develop new real-world scenario benchmarks. We introduce NegotiationToM, a new benchmark designed to stress-test machine ToM in real-world negotiation surrounding covered multi-dimensional mental states (i.e., desires, beliefs, and intentions). Our benchmark builds upon the Belief-Desire-Intention (BDI) agent modeling theory and conducts the necessary empirical experiments to evaluate large language models. Our findings demonstrate that NegotiationToM is challenging for state-of-the-art LLMs, as they consistently perform significantly worse than humans, even when employing the chain-of-thought (CoT) method.
- North America > Canada > Ontario > Toronto (0.04)
- Asia > Singapore (0.04)
- Asia > China > Hong Kong (0.04)
- (17 more...)
- Education (1.00)
- Health & Medicine > Therapeutic Area > Neurology (0.46)
Multi-level Contrastive Learning for Script-based Character Understanding
Li, Dawei, Zhang, Hengyuan, Li, Yanran, Yang, Shiping
In this work, we tackle the scenario of understanding characters in scripts, which aims to learn the characters' personalities and identities from their utterances. We begin by analyzing several challenges in this scenario, and then propose a multi-level contrastive learning framework to capture characters' global information in a fine-grained manner. To validate the proposed framework, we conduct extensive experiments on three character understanding sub-tasks by comparing with strong pre-trained language models, including SpanBERT, Longformer, BigBird and ChatGPT-3.5. Experimental results demonstrate that our method improves the performances by a considerable margin. Through further in-depth analysis, we show the effectiveness of our method in addressing the challenges and provide more hints on the scenario of character understanding. We will open-source our work on github at https://github.com/David-Li0406/Script-based-Character-Understanding.
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (5 more...)
- Research Report > New Finding (0.66)
- Research Report > Experimental Study (0.46)