juanita
ToMChallenges: A Principle-Guided Dataset and Diverse Evaluation Tasks for Exploring Theory of Mind
Ma, Xiaomeng, Gao, Lingyu, Xu, Qihui
Theory of Mind (ToM), the capacity to comprehend the mental states of distinct individuals, is essential for numerous practical applications. With the development of large language models (LLMs), there is a heated debate about whether they are able to perform ToM tasks. Previous studies have used different tasks and prompts to test the ToM on LLMs and the results are inconsistent: some studies asserted these models are capable of exhibiting ToM, while others suggest the opposite. In this study, We present ToMChallenges, a dataset for comprehensively evaluating the Theory of Mind based on the Sally-Anne and Smarties tests with a diverse set of tasks. In addition, we also propose an auto-grader to streamline the answer evaluation process. We tested three models: davinci, turbo, and gpt-4. Our evaluation results and error analyses show that LLMs have inconsistent behaviors across prompts and tasks. Performing the ToM tasks robustly remains a challenge for the LLMs. In addition, our paper wants to raise awareness in evaluating the ToM in LLMs and we want to invite more discussion on how to design the prompts and tasks for ToM tasks that can better assess the LLMs' ability.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > China > Hong Kong (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- (2 more...)
- Media > Film (0.68)
- Leisure & Entertainment (0.68)
- Education (0.46)
- Health & Medicine > Therapeutic Area > Neurology (0.46)
State initiative will distribute more than 800 robots to help support older adults – The Daily Gazette
Juanita's friend always asks how she's feeling, wants to know if she slept well and tells jokes that are so funny, Juanita repeats them later at dinner. Oh, and Juanita's friend happens to be a robot. ElliQ is a proactive care companion that uses artificial intelligence to build relationships with seniors while supporting their health and well-being. Perhaps thought of as a more empathetic version of Amazon's Alexa, ElliQ has been featured by major outlets such as the "Today" show, USA Today, The New Yorker and AARP. Juanita, who appears in promotional videos for ElliQ, is one of the more than 1,000 older adults who have helped test ElliQ, which first became available for purchase in the U.S. in March.
- North America > United States > New York > Saratoga County (0.05)
- Asia > Middle East > Israel (0.05)
- Health & Medicine > Consumer Health (1.00)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.35)