safety and control
Do Large Language Models Have a Planning Theory of Mind? Evidence from MindGames: a Multi-Step Persuasion Task
Moore, Jared, Cooper, Ned, Overmark, Rasmus, Cibralic, Beba, Haber, Nick, Jones, Cameron R.
Recent evidence suggests Large Language Models (LLMs) display Theory of Mind (ToM) abilities. Most ToM experiments place participants in a spectatorial role, wherein they predict and interpret other agents' behavior. However, human ToM also contributes to dynamically planning action and strategically intervening on others' mental states. We present MindGames: a novel `planning theory of mind' (PToM) task which requires agents to infer an interlocutor's beliefs and desires to persuade them to alter their behavior. Unlike previous evaluations, we explicitly evaluate use cases of ToM. We find that humans significantly outperform o1-preview (an LLM) at our PToM task (11% higher; $p=0.006$). We hypothesize this is because humans have an implicit causal model of other agents (e.g., they know, as our task requires, to ask about people's preferences). In contrast, o1-preview outperforms humans in a baseline condition which requires a similar amount of planning but minimal mental state inferences (e.g., o1-preview is better than humans at planning when already given someone's preferences). These results suggest a significant gap between human-like social reasoning and LLM abilities.
Workshop on Safety and Control for AI by White House OSTP/Carnegie Mellon Univ • /r/artificial
We here at Carnegie Mellon University wanted to let you know about a great event on artificial intelligence that we're hosting in conjunction with the White House Office of Science and Technology Policy in late June. You may have seen this recent article on these workshops featured in Wired. While we are but one of the four workshops going on in the coming months, we are the ONLY workshop in the series with a clear focus on the technical aspects of safe and controlled AI. We want to dive deep on how we can bring together machine learning, math-based systems reasoning, and software architecture to build AI systems with a high level of assurance. And we'd love for you to be a part of that conversation here in Pittsburgh.
SafArtInt 2016
The computer science community has been exploring the role of artificial intelligence (AI) in systems for more than a half-century. In the last few years, AI development has reached a threshold of practicability, and AI capability is now emerging in sectors ranging from vehicles, logistics, and military systems to health care, financial services, and smart cities. The economic and societal impacts could be dramatic, and investment in the development of AI applications is now a world-wide phenomenon. Many technical leaders now believe that the principal limits on exploiting AI derive primarily from our confidence in the safety of these smart systems – that they will operate in a safe and controlled manner. Some AI experts have asserted that the ability to assure safety and control is more important to the future of AI even than improvements in the AI algorithms themselves.