false belief
- Asia > Indonesia > Bali (0.07)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Hawaii (0.04)
- (14 more...)
- Leisure & Entertainment (0.46)
- Government (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.95)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
McMining: Automated Discovery of Misconceptions in Student Code
Al-Hossami, Erfan, Bunescu, Razvan
When learning to code, students often develop misconceptions about various programming language concepts. These can not only lead to bugs or inefficient code, but also slow down the learning of related concepts. In this paper, we introduce McMining, the task of mining programming misconceptions from samples of code from a student. To enable the training and evaluation of McMining systems, we develop an extensible benchmark dataset of misconceptions together with a large set of code samples where these misconceptions are manifested. We then introduce two LLM-based McMiner approaches and through extensive evaluations show that models from the Gemini, Claude, and GPT families are effective at discovering misconceptions in student code.
- Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
- North America > United States > North Carolina > Mecklenburg County > Charlotte (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Europe > Greece > Central Macedonia > Thessaloniki (0.04)
- Education (1.00)
- Information Technology > Security & Privacy (0.46)
- Asia > Indonesia > Bali (0.07)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Hawaii (0.04)
- (14 more...)
- Leisure & Entertainment (0.46)
- Government (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.95)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
Hallucinating with AI: AI Psychosis as Distributed Delusions
There is much discussion of the false outputs that generative AI systems such as ChatGPT, Claude, Gemini, DeepSeek, and Grok create. In popular terminology, these have been dubbed AI hallucinations. However, deeming these AI outputs hallucinations is controversial, with many claiming this is a metaphorical misnomer. Nevertheless, in this paper, I argue that when viewed through the lens of distributed cognition theory, we can better see the dynamic and troubling ways in which inaccurate beliefs, distorted memories and self-narratives, and delusional thinking can emerge through human-AI interactions; examples of which are popularly being referred to as cases of AI psychosis. In such cases, I suggest we move away from thinking about how an AI system might hallucinate at us, by generating false outputs, to thinking about how, when we routinely rely on generative AI to help us think, remember, and narrate, we can come to hallucinate with AI. This can happen when AI introduces errors into the distributed cognitive process, but it can also happen when AI sustains, affirms, and elaborates on our own delusional thinking and self-narratives, such as in the case of Jaswant Singh Chail. I also examine how the conversational style of chatbots can lead them to play a dual-function, both as a cognitive artefact and a quasi-Other with whom we co-construct our beliefs, narratives, and our realities. It is this dual function, I suggest, that makes generative AI an unusual, and particularly seductive, case of distributed cognition.
- Europe > United Kingdom > England (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- Europe > Switzerland (0.04)
- Law (1.00)
- Information Technology (1.00)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.78)
UniToMBench: Integrating Perspective-Taking to Improve Theory of Mind in LLMs
Thiyagarajan, Prameshwar, Parimi, Vaishnavi, Sai, Shamant, Garg, Soumil, Meirbek, Zhangir, Yarlagadda, Nitin, Zhu, Kevin, Kim, Chris
Theory of Mind (ToM), the ability to understand the mental states of oneself and others, remains a challenging area for large language models (LLMs), which often fail to predict human mental states accurately. In this paper, we introduce UniToMBench, a unified benchmark that integrates the strengths of SimToM and TOMBENCH to systematically improve and assess ToM capabilities in LLMs by integrating multi-interaction task designs and evolving story scenarios. Supported by a custom dataset of over 1,000 hand-written scenarios, UniToMBench combines perspective-taking techniques with diverse evaluation metrics to better stimulate social cognition in LLMs. Through evaluation, we observe that while models like GPT-4o and GPT-4o Mini show consistently high accuracy in tasks involving emotional and belief-related scenarios, with results usually above 80%, there is significant variability in their performance across knowledge-based tasks. These results highlight both the strengths and limitations of current LLMs in ToM-related tasks, underscoring the value of UniToMBench as a comprehensive tool for future development. Our code is publicly available here: https://github.com/Shamant/unifiedtombenchmark.
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > Monaco (0.04)
- Asia > Middle East > Jordan (0.04)
- Africa > Mali (0.04)
ToMATO: Verbalizing the Mental States of Role-Playing LLMs for Benchmarking Theory of Mind
Shinoda, Kazutoshi, Hojo, Nobukatsu, Nishida, Kyosuke, Mizuno, Saki, Suzuki, Keita, Masumura, Ryo, Sugiyama, Hiroaki, Saito, Kuniko
Existing Theory of Mind (ToM) benchmarks diverge from real-world scenarios in three aspects: 1) they assess a limited range of mental states such as beliefs, 2) false beliefs are not comprehensively explored, and 3) the diverse personality traits of characters are overlooked. To address these challenges, we introduce ToMATO, a new ToM benchmark formulated as multiple-choice QA over conversations. ToMATO is generated via LLM-LLM conversations featuring information asymmetry. By employing a prompting method that requires role-playing LLMs to verbalize their thoughts before each utterance, we capture both first- and second-order mental states across five categories: belief, intention, desire, emotion, and knowledge. These verbalized thoughts serve as answers to questions designed to assess the mental states of characters within conversations. Furthermore, the information asymmetry introduced by hiding thoughts from others induces the generation of false beliefs about various mental states. Assigning distinct personality traits to LLMs further diversifies both utterances and thoughts. ToMATO consists of 5.4k questions, 753 conversations, and 15 personality trait patterns. Our analysis shows that this dataset construction approach frequently generates false beliefs due to the information asymmetry between role-playing LLMs, and effectively reflects diverse personalities. We evaluate nine LLMs on ToMATO and find that even GPT-4o mini lags behind human performance, especially in understanding false beliefs, and lacks robustness to various personality traits.
- Education (0.48)
- Health & Medicine > Therapeutic Area (0.46)
Belief in the Machine: Investigating Epistemological Blind Spots of Language Models
Suzgun, Mirac, Gur, Tayfun, Bianchi, Federico, Ho, Daniel E., Icard, Thomas, Jurafsky, Dan, Zou, James
As language models (LMs) become integral to fields like healthcare, law, and journalism, their ability to differentiate between fact, belief, and knowledge is essential for reliable decision-making. Failure to grasp these distinctions can lead to significant consequences in areas such as medical diagnosis, legal judgments, and dissemination of fake news. Despite this, current literature has largely focused on more complex issues such as theory of mind, overlooking more fundamental epistemic challenges. This study systematically evaluates the epistemic reasoning capabilities of modern LMs, including GPT-4, Claude-3, and Llama-3, using a new dataset, KaBLE, consisting of 13,000 questions across 13 tasks. Our results reveal key limitations. First, while LMs achieve 86% accuracy on factual scenarios, their performance drops significantly with false scenarios, particularly in belief-related tasks. Second, LMs struggle with recognizing and affirming personal beliefs, especially when those beliefs contradict factual data, which raises concerns for applications in healthcare and counseling, where engaging with a person's beliefs is critical. Third, we identify a salient bias in how LMs process first-person versus third-person beliefs, performing better on third-person tasks (80.7%) compared to first-person tasks (54.4%). Fourth, LMs lack a robust understanding of the factive nature of knowledge, namely, that knowledge inherently requires truth. Fifth, LMs rely on linguistic cues for fact-checking and sometimes bypass the deeper reasoning. These findings highlight significant concerns about current LMs' ability to reason about truth, belief, and knowledge while emphasizing the need for advancements in these areas before broad deployment in critical sectors.
- Asia > China (0.15)
- Oceania > Australia (0.05)
- Pacific Ocean (0.05)
- (24 more...)
- Media > News (1.00)
- Law (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
- (4 more...)
Learning mental states estimation through self-observation: a developmental synergy between intentions and beliefs representations in a deep-learning model of Theory of Mind
Bianco, Francesca, Rigato, Silvia, Filippetti, Maria Laura, Ognibene, Dimitri
Theory of Mind (ToM), the ability to attribute beliefs, intentions, or mental states to others, is a crucial feature of human social interaction. In complex environments, where the human sensory system reaches its limits, behaviour is strongly driven by ou r beliefs about the state of the world around us. Accessing others' mental states, e.g., beliefs and intentions, allows for more effective social interactions in natural contexts. Yet, these variables are not directly observable, making understanding ToM a challenging quest of interest for different fields, including psychology, machine learning and robotics. In this paper, we contribute to this topic by showing a developmental synergy between learning to predict low - level mental states (e.g., intentions, g oals) and attributing high - level ones (i.e., beliefs). Specifically, we assume that learning beliefs attribution can occur by observing one's own decision processes involving beliefs, e.g., in a partially observable environment. Using a simple feed - forward deep learning model, we show that, when learning to predict others' intentions and actions, more accurate predictions can be acquired earlier if beliefs attribution is learnt simultaneously. Furthermore, we show that the learning performance improves even when observed actors have a different embodiment than the observer and the gain is higher when observing beliefs - driven chunks of behaviour. We propose that our computational approach can inform the understanding of human social cognitive development and be relevant for the design of future adaptive social robots able to autonomously understand, assist, and learn from human interaction partners in novel natural environments and tasks.
- North America > United States > New York > New York County > New York City (0.14)
- Europe > United Kingdom > England > Essex > Colchester (0.04)
- Europe > Spain > Galicia > Madrid (0.04)
- (3 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.93)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Explicit Modelling of Theory of Mind for Belief Prediction in Nonverbal Social Interactions
Bortoletto, Matteo, Ruhdorfer, Constantin, Shi, Lei, Bulling, Andreas
We propose MToMnet - a Theory of Mind (ToM) neural network for predicting beliefs and their dynamics during human social interactions from multimodal input. ToM is key for effective nonverbal human communication and collaboration, yet, existing methods for belief modelling have not included explicit ToM modelling or have typically been limited to one or two modalities. MToMnet encodes contextual cues (scene videos and object locations) and integrates them with person-specific cues (human gaze and body language) in a separate MindNet for each person. Inspired by prior research on social cognition and computational ToM, we propose three different MToMnet variants: two involving fusion of latent representations and one involving re-ranking of classification scores. We evaluate our approach on two challenging real-world datasets, one focusing on belief prediction, while the other examining belief dynamics prediction. Our results demonstrate that MToMnet surpasses existing methods by a large margin while at the same time requiring a significantly smaller number of parameters. Taken together, our method opens up a highly promising direction for future work on artificial intelligent systems that can robustly predict human beliefs from their non-verbal behaviour and, as such, more effectively collaborate with humans.
- Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
AI models can outperform humans in tests to identify mental states
Theory of mind is a hallmark of emotional and social intelligence that allows us to infer people's intentions and engage and empathize with one another. Most children pick up these kinds of skills between three and five years of age. The researchers tested two families of large language models, OpenAI's GPT-3.5 and GPT-4 and three versions of Meta's Llama, on tasks designed to test the theory of mind in humans, including identifying false beliefs, recognizing faux pas, and understanding what is being implied rather than said directly. They also tested 1,907 human participants in order to compare the sets of scores. The team conducted five types of tests.