McIlraith, Sheila
Managing AI Risks in an Era of Rapid Progress
Bengio, Yoshua, Hinton, Geoffrey, Yao, Andrew, Song, Dawn, Abbeel, Pieter, Harari, Yuval Noah, Zhang, Ya-Qin, Xue, Lan, Shalev-Shwartz, Shai, Hadfield, Gillian, Clune, Jeff, Maharaj, Tegan, Hutter, Frank, Baydin, Atılım Güneş, McIlraith, Sheila, Gao, Qiqi, Acharya, Ashwin, Krueger, David, Dragan, Anca, Torr, Philip, Russell, Stuart, Kahneman, Daniel, Brauner, Jan, Mindermann, Sören
In this short consensus paper, we outline risks from upcoming, advanced AI systems. We examine large-scale social harms and malicious uses, as well as an irreversible loss of human control over autonomous AI systems. In light of rapid and continuing AI progress, we propose urgent priorities for AI R&D and governance.
STEVE-1: A Generative Model for Text-to-Behavior in Minecraft
Lifshitz, Shalev, Paster, Keiran, Chan, Harris, Ba, Jimmy, McIlraith, Sheila
Constructing AI models that respond to text instructions is challenging, especially for sequential decision-making tasks. This work introduces an instruction-tuned Video Pretraining (VPT) model for Minecraft called STEVE-1, demonstrating that the unCLIP approach, utilized in DALL-E 2, is also effective for creating instruction-following sequential decision-making agents. STEVE-1 is trained in two steps: adapting the pretrained VPT model to follow commands in MineCLIP's latent space, then training a prior to predict latent codes from text. This allows us to finetune VPT through self-supervised behavioral cloning and hindsight relabeling, bypassing the need for costly human text annotations. By leveraging pretrained models like VPT and MineCLIP and employing best practices from text-conditioned image generation, STEVE-1 costs just $60 to train and can follow a wide range of short-horizon open-ended text and visual instructions in Minecraft. STEVE-1 sets a new bar for open-ended instruction following in Minecraft with low-level controls (mouse and keyboard) and raw pixel inputs, far outperforming previous baselines. We provide experimental evidence highlighting key factors for downstream performance, including pretraining, classifier-free guidance, and data scaling. All resources, including our model weights, training scripts, and evaluation tools are made available for further research.
Optimal Decision Trees For Interpretable Clustering with Constraints (Extended Version)
Shati, Pouya, Cohen, Eldan, McIlraith, Sheila
Constrained clustering is a semi-supervised task that employs a limited amount of labelled data, formulated as constraints, to incorporate domain-specific knowledge and to significantly improve clustering accuracy. Previous work has considered exact optimization formulations that can guarantee optimal clustering while satisfying all constraints, however these approaches lack interpretability. Recently, decision-trees have been used to produce inherently interpretable clustering solutions, however existing approaches do not support clustering constraints and do not provide strong theoretical guarantees on solution quality. In this work, we present a novel SAT-based framework for interpretable clustering that supports clustering constraints and that also provides strong theoretical guarantees on solution quality. We also present new insight into the trade-off between interpretability and satisfaction of such user-provided constraints. Our framework is the first approach for interpretable and constrained clustering. Experiments with a range of real-world and synthetic datasets demonstrate that our approach can produce high-quality and interpretable constrained clustering solutions.
You Can't Count on Luck: Why Decision Transformers and RvS Fail in Stochastic Environments
Paster, Keiran, McIlraith, Sheila, Ba, Jimmy
Recently, methods such as Decision Transformer that reduce reinforcement learning to a prediction task and solve it via supervised learning (RvS) have become popular due to their simplicity, robustness to hyperparameters, and strong overall performance on offline RL tasks. However, simply conditioning a probabilistic model on a desired return and taking the predicted action can fail dramatically in stochastic environments since trajectories that result in a return may have only achieved that return due to luck. In this work, we describe the limitations of RvS approaches in stochastic environments and propose a solution. Rather than simply conditioning on the return of a single trajectory as is standard practice, our proposed method, ESPER, learns to cluster trajectories and conditions on average cluster returns, which are independent from environment stochasticity. Doing so allows ESPER to achieve strong alignment between target return and expected performance in real environments. We demonstrate this in several challenging stochastic offline-RL tasks including the challenging puzzle game 2048, and Connect Four playing against a stochastic opponent. In all tested domains, ESPER achieves significantly better alignment between the target return and achieved return than simply conditioning on returns. ESPER also achieves higher maximum performance than even the value-based baselines.
Efficient Multi-agent Epistemic Planning: Teaching Planners About Nested Belief
Muise, Christian, Belle, Vaishak, Felli, Paolo, McIlraith, Sheila, Miller, Tim, Pearce, Adrian R., Sonenberg, Liz
In the absence of prescribed coordination, it is often necessary for individual agents to synthesize their own plans, taking into account not only their own capabilities and beliefs about the world but also their beliefs about other agents, including what each of the agents will come to believe as the consequence of the actions of others. To illustrate, consider the scenario where Larry and Moe meet on a regular basis at the local diner to swap the latest gossip. Larry has come to know that Nancy (Larry's daughter) has just received a major promotion in her job, but unbeknownst to him, Moe has already learned this bit of information through the grapevine. Before they speak, both believe Nancy is getting a promotion, Larry believes Moe is unaware of this (and consequently wishes to share the news), and Moe assumes Larry must already be aware of the promotion but is unaware of Moe's own knowledge of the situation. Very quickly we can see how the nesting of (potentially incorrect) belief can be a complicated and interesting setting to model. In this paper, we examine the problem of synthesizing plans in such settings. In particular, given a finite set of agents, each with: (1) (possibly incomplete and incorrect) beliefs about the world and about the beliefs of other agents; and (2) differing capabilities including the ability to perform actions whose outcomes are unknown to other agents; we are interested in synthesizing a plan to achieve a goal condition. Planning is at the belief level and as such, while we consider the execution of actions that can change the state of the world (ontic actions) as well as an agent's state of knowledge or belief (epistemic or more accurately doxastic actions, including communication actions), all outcomes are with respect to belief.
LTL2Action: Generalizing LTL Instructions for Multi-Task RL
Vaezipoor, Pashootan, Li, Andrew, Icarte, Rodrigo Toro, McIlraith, Sheila
We address the problem of teaching a deep reinforcement learning (RL) agent to follow instructions in multi-task environments. We employ a well-known formal language -- linear temporal logic (LTL) -- to specify instructions, using a domain-specific vocabulary. We propose a novel approach to learning that exploits the compositional syntax and the semantics of LTL, enabling our RL agent to learn task-conditioned policies that generalize to new instructions, not observed during training. The expressive power of LTL supports the specification of a diversity of complex temporally extended behaviours that include conditionals and alternative realizations. Experiments on discrete and continuous domains demonstrate the strength of our approach in learning to solve (unseen) tasks, given LTL instructions.
A Recap of the AAAI and IAAI 2018 Conferences and the EAAI Symposium
McIlraith, Sheila (University of Toronto) | Weinberger, Kilian (Cornell University) | Youngblood, G. Michael (PARC) | Myers, Karen (SRI International) | Eaton, Eric (University of Pennsylvania) | Wollowski, Michael (Rose-Hulman Institute of Technology)
The 2018 AAAI Conference on Artificial Intelligence, the 2018 Innovative Applications of Artificial Intelligence, and the 2018 Symposium on Educational Advances in Artificial Intelligence were held February 2–7, 2018 at the Hilton New Orleans Riverside, New Orleans, Louisiana, USA. This report, based on the prefaces contained in the AAAI-18 proceedings and program, summarizes the events of the conference.
Reports of the AAAI 2014 Conference Workshops
Albrecht, Stefano V. (University of Edinburgh) | Barreto, André M. S. (Brazilian National Laboratory for Scientific Computing) | Braziunas, Darius (Kobo Inc.) | Buckeridge, David L. (McGill University) | Cuayáhuitl, Heriberto (Heriot-Watt University) | Dethlefs, Nina (Heriot-Watt University) | Endres, Markus (University of Augsburg) | Farahmand, Amir-massoud (Carnegie Mellon University) | Fox, Mark (University of Toronto) | Frommberger, Lutz (University of Bremen) | Ganzfried, Sam (Carnegie Mellon University) | Gil, Yolanda (University of Southern California) | Guillet, Sébastien (Université du Québec à Chicoutimi) | Hunter, Lawrence E. (University of Colorado School of Medicine) | Jhala, Arnav (University of California Santa Cruz) | Kersting, Kristian (Technical University of Dortmund) | Konidaris, George (Massachusetts Institute of Technology) | Lecue, Freddy (IBM Research) | McIlraith, Sheila (University of Toronto) | Natarajan, Sriraam (Indiana University) | Noorian, Zeinab (University of Saskatchewan) | Poole, David (University of British Columbia) | Ronfard, Rémi (University of Grenoble) | Saffiotti, Alessandro (Orebro University) | Shaban-Nejad, Arash (McGill University) | Srivastava, Biplav (IBM Research) | Tesauro, Gerald (IBM Research) | Uceda-Sosa, Rosario (IBM Research) | Broeck, Guy Van den (Katholieke Universiteit Leuven) | Otterlo, Martijn van (Radboud University Nijmegen) | Wallace, Byron C. (University of Texas) | Weng, Paul (Pierre and Marie Curie University) | Wiens, Jenna (University of Michigan) | Zhang, Jie (Nanyang Technological University)
The AAAI-14 Workshop program was held Sunday and Monday, July 27–28, 2012, at the Québec City Convention Centre in Québec, Canada. The AAAI-14 workshop program included fifteen workshops covering a wide range of topics in artificial intelligence. The titles of the workshops were AI and Robotics; Artificial Intelligence Applied to Assistive Technologies and Smart Environments; Cognitive Computing for Augmented Human Intelligence; Computer Poker and Imperfect Information; Discovery Informatics; Incentives and Trust in Electronic Communities; Intelligent Cinematography and Editing; Machine Learning for Interactive Systems: Bridging the Gap between Perception, Action and Communication; Modern Artificial Intelligence for Health Analytics; Multiagent Interaction without Prior Coordination; Multidisciplinary Workshop on Advances in Preference Handling; Semantic Cities -- Beyond Open Data to Models, Standards and Reasoning; Sequential Decision Making with Big Data; Statistical Relational AI; and The World Wide Web and Public Health Intelligence. This article presents short summaries of those events.
Reports of the AAAI 2014 Conference Workshops
Albrecht, Stefano V. (University of Edinburgh) | Barreto, André M. S. (Brazilian National Laboratory for Scientific Computing) | Braziunas, Darius (Kobo Inc.) | Buckeridge, David L. (McGill University) | Cuayáhuitl, Heriberto (Heriot-Watt University) | Dethlefs, Nina (Heriot-Watt University) | Endres, Markus (University of Augsburg) | Farahmand, Amir-massoud (Carnegie Mellon University) | Fox, Mark (University of Toronto) | Frommberger, Lutz (University of Bremen) | Ganzfried, Sam (Carnegie Mellon University) | Gil, Yolanda (University of Southern California) | Guillet, Sébastien (Université du Québec à Chicoutimi) | Hunter, Lawrence E. (University of Colorado School of Medicine) | Jhala, Arnav (University of California Santa Cruz) | Kersting, Kristian (Technical University of Dortmund) | Konidaris, George (Massachusetts Institute of Technology) | Lecue, Freddy (IBM Research) | McIlraith, Sheila (University of Toronto) | Natarajan, Sriraam (Indiana University) | Noorian, Zeinab (University of Saskatchewan) | Poole, David (University of British Columbia) | Ronfard, Rémi (University of Grenoble) | Saffiotti, Alessandro (Orebro University) | Shaban-Nejad, Arash (McGill University) | Srivastava, Biplav (IBM Research) | Tesauro, Gerald (IBM Research) | Uceda-Sosa, Rosario (IBM Research) | Broeck, Guy Van den (Katholieke Universiteit Leuven) | Otterlo, Martijn van (Radboud University Nijmegen) | Wallace, Byron C. (University of Texas) | Weng, Paul (Pierre and Marie Curie University) | Wiens, Jenna (University of Michigan) | Zhang, Jie (Nanyang Technological University)
The AAAI-14 Workshop program was held Sunday and Monday, July 27–28, 2012, at the Québec City Convention Centre in Québec, Canada. Canada. The AAAI-14 workshop program included fifteen workshops covering a wide range of topics in artificial intelligence. The titles of the workshops were AI and Robotics; Artificial Intelligence Applied to Assistive Technologies and Smart Environments; Cognitive Computing for Augmented Human Intelligence; Computer Poker and Imperfect Information; Discovery Informatics; Incentives and Trust in Electronic Communities; Intelligent Cinematography and Editing; Machine Learning for Interactive Systems: Bridging the Gap between Perception, Action and Communication; Modern Artificial Intelligence for Health Analytics; Multiagent Interaction without Prior Coordination; Multidisciplinary Workshop on Advances in Preference Handling; Semantic Cities — Beyond Open Data to Models, Standards and Reasoning; Sequential Decision Making with Big Data; Statistical Relational AI; and The World Wide Web and Public Health Intelligence. This article presents short summaries of those events.
Planning Over Multi-Agent Epistemic States: A Classical Planning Approach
Muise, Christian (University of Melbourne) | Belle, Vaishak (University of Toronto) | Felli, Paolo (University of Melbourne) | McIlraith, Sheila (University of Toronto) | Miller, Tim (University of Melbourne) | Pearce, Adrian R. (University of Melbourne) | Sonenberg, Liz (University of Melbourne)
Many AI applications involve the interaction of multiple autonomous agents, requiring those agents to reason about their own beliefs, as well as those of other agents. However, planning involving nested beliefs is known to be computationally challenging. In this work, we address the task of synthesizing plans that necessitate reasoning about the beliefs of other agents. We plan from the perspective of a single agent with the potential for goals and actions that involve nested beliefs, non-homogeneous agents, co-present observations, and the ability for one agent to reason as if it were another. We formally characterize our notion of planning with nested belief, and subsequently demonstrate how to automatically convert such problems into problems that appeal to classical planning technology. Our approach represents an important first step towards applying the well-established field of automated planning to the challenging task of planning involving nested beliefs of multiple agents.