Building artificial intelligence (AI) that aligns with human values is an unsolved problem. Here we developed a human-in-the-loop research pipeline called Democratic AI, in which reinforcement learning is used to design a social mechanism that humans prefer by majority. A large group of humans played an online investment game that involved deciding whether to keep a monetary endowment or to share it with others for collective benefit. Shared revenue was returned to players under two different redistribution mechanisms, one designed by the AI and the other by humans. The AI discovered a mechanism that redressed initial wealth imbalance, sanctioned free riders and successfully won the majority vote. By optimising for human preferences, Democratic AI offers a proof of concept for value-aligned policy innovation.
We develop reinforcement learning techniques to enable interaction across multiple agents including AIs and humans, with potential applications from AI-assisted design to autonomous driving. Methodological contexts of the research include deep reinforcement learning, inverse reinforcement learning, hierarchical reinforcement learning as well as multi-agent and multi-objective reinforcement learning. FCAI is working on a new paradigm of AI-assisted design that aims to cooperate with designers by supporting and leveraging the creativity and problem-solving of designers. The challenge for such AI is how to infer designers' goals and then help them without being needlessly disruptive. We use generative user models to reason about designers' goals, reasoning, and capabilities. In this call, FCAI is looking for a postdoctoral scholar or research fellow to join our effort to develop AI-assisted design. Suitable backgrounds include deep reinforcement learning, Bayesian inference, cooperative AI, computational cognitive modelling, and user modelling. Computational rationality is an emerging integrative theory of intelligence in humans and machines (1) with applications in human-computer interaction, cooperative AI, and robotics. The theory assumes that observable human behavior is generated by cognitive mechanisms that are adapted to the structure of not only the environment but also the mind and brain itself (2).
The Workshop Program of the Association for the Advancement of Artificial Intelligence's Thirty-Sixth Conference on Artificial Intelligence was held virtually from February 22 – March 1, 2022. There were thirty-nine workshops in the program: Adversarial Machine Learning and Beyond, AI for Agriculture and Food Systems, AI for Behavior Change, AI for Decision Optimization, AI for Transportation, AI in Financial Services: Adaptiveness, Resilience & Governance, AI to Accelerate Science and Engineering, AI-Based Design and Manufacturing, Artificial Intelligence for Cyber Security, Artificial Intelligence for Education, Artificial Intelligence Safety, Artificial Intelligence with Biased or Scarce Data, Combining Learning and Reasoning: Programming Languages, Formalisms, and Representations, Deep Learning on Graphs: Methods and Applications, DE-FACTIFY: Multi-Modal Fake News and Hate-Speech Detection, Dialog System Technology Challenge, Engineering Dependable and Secure Machine Learning Systems, Explainable Agency in Artificial Intelligence, Graphs and More Complex Structures for Learning and Reasoning, Health Intelligence, Human-Centric Self-Supervised Learning, Information-Theoretic Methods for Casual Inference and Discovery, Information Theory for Deep Learning, Interactive Machine Learning, Knowledge Discovery from Unstructured Data in Financial Services, Learning Network Architecture during Training, Machine Learning for Operations Research, Optimal Transports and Structured Data Modeling, Practical Deep Learning in the Wild, Privacy-Preserving Artificial Intelligence, Reinforcement Learning for Education: Opportunities and Challenges, Reinforcement Learning in Games, Robust Artificial Intelligence System Assurance, Scientific Document Understanding, Self-Supervised Learning for Audio and Speech Processing, Trustable, Verifiable and Auditable Federated Learning, Trustworthy AI for Healthcare, Trustworthy Autonomous Systems Engineering, and Video Transcript Understanding. This report contains summaries of the workshops, which were submitted by most, but not all the workshop chairs.
I hope that you've come across, from algorithms achieving super-human level performance at Atari 2600 games, beating professional players at GO, Dota 2 and StarCraft II and to algorithms controlling nuclear-fusion reactors. These are the success stories of reinforcement learning algorithms combined with deep learning (Deep Reinforcement learning or DeepRL). Google's DeepMind and OpenAI heavily does research in this area and thinks that DeepRL is the future of AI. Some Researchers even think that RL might be the key to Artificial General Intelligence (AGI). Reinforcement Learning (RL) is one of the paradigms of Machine learning along with supervised and unsupervised learning.
The conventional approach for improving the decision-making of deep reinforcement learning (RL) agents is to gradually amortize the useful information they gain from their experiences via gradient descent on training losses. This method however requires building increasingly large models to deal with increasingly complex environments and is difficult to adapt to novel situations. Although adding information sources can benefit agent performance, there is currently no end-to-end solution for enabling agents to attend to information outside their working memory to inform their actions. In the new paper Large-Scale Retrieval for Reinforcement Learning, a DeepMind research team introduces a novel approach that dramatically expands the information accessible to reinforcement learning (RL) agents, enabling them to attend to tens of millions of information pieces, incorporate new information without retraining, and learn in an end-to-end manner how to use this information in their decision making. In the work, the team trains a semiparametric model-based agent to predict future policies and values conditioned on future actions in a given state and adds a retrieval mechanism to enable the model to draw from information in a large-scale dataset to inform its predictions.
The combination of Reinforcement Learning (RL) with deep learning has led to a series of impressive feats, with many believing (deep) RL provides a path towards generally capable agents. However, the success of RL agents is often highly sensitive to design choices in the training process, which may require tedious and error-prone manual tuning. This makes it challenging to use RL for new problems and also limits its full potential. In many other areas of machine learning, AutoML has shown that it is possible to automate such design choices, and AutoML has also yielded promising initial results when applied to RL. However, Automated Reinforcement Learning (AutoRL) involves not only standard applications of AutoML but also includes additional challenges unique to RL, that naturally produce a different set of methods. As such, AutoRL has been emerging as an important area of research in RL, providing promise in a variety of applications from RNA design to playing games, such as Go. Given the diversity of methods and environments considered in RL, much of the research has been conducted in distinct subfields, ranging from meta-learning to evolution. In this survey, we seek to unify the field of AutoRL, provide a common taxonomy, discuss each area in detail and pose open problems of interest to researchers going forward.
Deep reinforcement learning (DRL) is transitioning from a research field focused on game playing to a technology with real-world applications. Notable examples include DeepMind's work on controlling a nuclear reactor or on improving Youtube video compression, or Tesla attempting to use a method inspired by MuZero for autonomous vehicle behavior planning. But the exciting potential for real world applications of RL should also come with a healthy dose of caution – for example RL policies are well known to be vulnerable to exploitation, and methods for safe and robust policy development are an active area of research. At the same time as the emergence of powerful RL systems in the real world, the public and researchers are expressing an increased appetite for fair, aligned, and safe machine learning systems. The focus of these research efforts to date has been to account for shortcomings of datasets or supervised learning practices that can harm individuals.
Francis, Jonathan (Carnegie Mellon University) | Kitamura, Nariaki (Carnegie Mellon University) | Labelle, Felix (Carnegie Mellon University) | Lu, Xiaopeng (Carnegie Mellon University) | Navarro, Ingrid (Carnegie Mellon University) | Oh, Jean
Recent advances in the areas of multimodal machine learning and artificial intelligence (AI) have led to the development of challenging tasks at the intersection of Computer Vision, Natural Language Processing, and Embodied AI. Whereas many approaches and previous survey pursuits have characterised one or two of these dimensions, there has not been a holistic analysis at the center of all three. Moreover, even when combinations of these topics are considered, more focus is placed on describing, e.g., current architectural methods, as opposed to also illustrating high-level challenges and opportunities for the field. In this survey paper, we discuss Embodied Vision-Language Planning (EVLP) tasks, a family of prominent embodied navigation and manipulation problems that jointly use computer vision and natural language. We propose a taxonomy to unify these tasks and provide an in-depth analysis and comparison of the new and current algorithmic approaches, metrics, simulated environments, as well as the datasets used for EVLP tasks. Finally, we present the core challenges that we believe new EVLP works should seek to address, and we advocate for task construction that enables model generalizability and furthers real-world deployment.
Advances in artificial intelligence often stem from the development of new environments that abstract real-world situations into a form where research can be done conveniently. This paper contributes such an environment based on ideas inspired by elementary Microeconomics. Agents learn to produce resources in a spatially complex world, trade them with one another, and consume those that they prefer. We show that the emergent production, consumption, and pricing behaviours respond to environmental conditions in the directions predicted by supply and demand shifts in Microeconomics. We also demonstrate settings where the agents' emergent prices for goods vary over space, reflecting the local abundance of goods.
Sample efficiency for policy gradient methods is pretty poor. We throw out each batch of data immediately after just one gradient step. This is the most complete Reinforcement Learning course series on Udemy. In it, you will learn to implement some of the most powerful Deep Reinforcement Learning algorithms in Python using PyTorch and PyTorch lightning. You will implement from scratch adaptive algorithms that solve control tasks based on experience.