AITopics | Dragan, Anca

Plotting

Dragan, Anca

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Towards Flexible Inference in Sequential Decision Problems via Bidirectional Transformers

Carroll, Micah, Lin, Jessy, Paradise, Orr, Georgescu, Raluca, Sun, Mingfei, Bignell, David, Milani, Stephanie, Hofmann, Katja, Hausknecht, Matthew, Dragan, Anca, Devlin, Sam

arXiv.org Artificial IntelligenceDec-9-2022

Note: This is paper is superseded by the full version (Carroll et al., 2022). Randomly masking and predicting word tokens has been a successful approach in pre-training language models for a variety of downstream tasks. In this work, we observe that the same idea also applies naturally to sequential decision making, where many well-studied tasks like behavior cloning, offline RL, inverse dynamics, and waypoint conditioning correspond to different sequence maskings over a sequence of states, actions, and returns. We introduce the FlexiBiT framework, which provides a unified way to specify models which can be trained on many different sequential decision making tasks. We show that a single FlexiBiT model is simultaneously capable of carrying out many tasks with performance similar to or better than specialized models. Additionally, we show that performance can be further improved by fine-tuning our general model on specific tasks of interest. Masked language modeling (Devlin et al., 2018) is a key technique in natural language processing (NLP). Under this paradigm, models are trained to predict randomly-masked subsets of tokens in a sequence.

machine learning, natural language, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2204.13326

Country: Asia (0.28)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Time-Efficient Reward Learning via Visually Assisted Cluster Ranking

Zhang, David, Carroll, Micah, Bobu, Andreea, Dragan, Anca

arXiv.org Artificial IntelligenceNov-30-2022

One of the most successful paradigms for reward learning uses human feedback in the form of comparisons. Although these methods hold promise, human comparison labeling is expensive and time consuming, constituting a major bottleneck to their broader applicability. Our insight is that we can greatly improve how effectively human time is used in these approaches by batching comparisons together, rather than having the human label each comparison individually. To do so, we leverage data dimensionality-reduction and visualization techniques to provide the human with a interactive GUI displaying the state space, in which the user can label subportions of the state space. Across some simple Mujoco tasks, we show that this high-level approach holds promise and is able to greatly increase the performance of the resulting agents, provided the same amount of human labeling time.

data mining, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2212.00169

Country: North America > United States > New York (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Robots (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)

Add feedback

UniMASK: Unified Inference in Sequential Decision Problems

Carroll, Micah, Paradise, Orr, Lin, Jessy, Georgescu, Raluca, Sun, Mingfei, Bignell, David, Milani, Stephanie, Hofmann, Katja, Hausknecht, Matthew, Dragan, Anca, Devlin, Sam

arXiv.org Artificial IntelligenceNov-19-2022

Randomly masking and predicting word tokens has been a successful approach in pre-training language models for a variety of downstream tasks. In this work, we observe that the same idea also applies naturally to sequential decision making, where many well-studied tasks like behavior cloning, offline reinforcement learning, inverse dynamics, and waypoint conditioning correspond to different sequence maskings over a sequence of states, actions, and returns. We introduce the Uni[MASK] framework, which provides a unified way to specify models which can be trained on many different sequential decision making tasks. We show that a single Uni[MASK] model is often capable of carrying out many tasks with performance similar to or better than single-task models. Additionally, after fine-tuning, our Uni[MASK] models consistently outperform comparable single-task models. Our code is publicly available here.

machine learning, natural language, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2211.10869

Country: North America > United States (1.00)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Optimal Behavior Prior: Data-Efficient Human Models for Improved Human-AI Collaboration

Yang, Mesut, Carroll, Micah, Dragan, Anca

arXiv.org Artificial IntelligenceNov-19-2022

AI agents designed to collaborate with people benefit from models that enable them to anticipate human behavior. However, realistic models tend to require vast amounts of human data, which is often hard to collect. A good prior or initialization could make for more data-efficient training, but what makes for a good prior on human behavior? Our work leverages a very simple assumption: people generally act closer to optimal than to random chance. We show that using optimal behavior as a prior for human models makes these models vastly more data-efficient and able to generalize to new environments. Our intuition is that such a prior enables the training to focus one's precious real-world data on capturing the subtle nuances of human suboptimality, instead of on the basics of how to do the task in the first place. We also show that using these improved human models often leads to better human-AI collaboration performance compared to using models based on real human data alone.

artificial intelligence, human model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2211.01602

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Robots (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

B-Pref: Benchmarking Preference-Based Reinforcement Learning

Lee, Kimin, Smith, Laura, Dragan, Anca, Abbeel, Pieter

arXiv.org Artificial IntelligenceNov-4-2021

Reinforcement learning (RL) requires access to a reward function that incentivizes the right behavior, but these are notoriously hard to specify for complex tasks. Preference-based RL provides an alternative: learning policies using a teacher's preferences without pre-defined rewards, thus overcoming concerns associated with reward engineering. However, it is difficult to quantify the progress in preference-based RL due to the lack of a commonly adopted benchmark. In this paper, we introduce B-Pref: a benchmark specially designed for preference-based RL. A key challenge with such a benchmark is providing the ability to evaluate candidate algorithms quickly, which makes relying on real human input for evaluation prohibitive. At the same time, simulating human input as giving perfect preferences for the ground truth reward function is unrealistic. B-Pref alleviates this by simulating teachers with a wide array of irrationalities, and proposes metrics not solely for performance but also for robustness to these potential irrationalities. We showcase the utility of B-Pref by using it to analyze algorithmic design choices, such as selecting informative queries, for state-of-the-art preference-based RL algorithms. We hope that B-Pref can serve as a common starting point to study preference-based RL more systematically. Source code is available at https://github.com/rll-research/B-Pref.

environment step, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2111.03026

Genre: Research Report (0.82)

Industry:

Education (0.93)
Leisure & Entertainment > Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

The MineRL BASALT Competition on Learning from Human Feedback

Shah, Rohin, Wild, Cody, Wang, Steven H., Alex, Neel, Houghton, Brandon, Guss, William, Mohanty, Sharada, Kanervisto, Anssi, Milani, Stephanie, Topin, Nicholay, Abbeel, Pieter, Russell, Stuart, Dragan, Anca

arXiv.org Artificial IntelligenceJul-5-2021

The last decade has seen a significant increase of interest in deep learning research, with many public successes that have demonstrated its potential. As such, these systems are now being incorporated into commercial products. With this comes an additional challenge: how can we build AI systems that solve tasks where there is not a crisp, well-defined specification? While multiple solutions have been proposed, in this competition we focus on one in particular: learning from human feedback. Rather than training AI systems using a predefined reward function or using a labeled dataset with a predefined set of categories, we instead train the AI system using a learning signal derived from some form of human feedback, which can evolve over time as the understanding of the task changes, or as the capabilities of the AI system improve. The MineRL BASALT competition aims to spur forward research on this important class of techniques. We design a suite of four tasks in Minecraft for which we expect it will be hard to write down hardcoded reward functions. These tasks are defined by a paragraph of natural language: for example, "create a waterfall and take a scenic picture of it", with additional clarifying details. Participants must train a separate agent for each task, using any method they want. Agents are then evaluated by humans who have read the task description. To help participants get started, we provide a dataset of human demonstrations on each of the four tasks, as well as an imitation learning baseline that leverages these demonstrations. Our hope is that this competition will improve our ability to build AI systems that do what their designers intend them to do, even when the intent cannot be easily formalized. Besides allowing AI to solve more tasks, this can also enable more effective regulation of AI systems, as well as making progress on the value alignment problem.

competition, computer game, deep learning, (20 more...)

arXiv.org Artificial Intelligence

2107.01969

Country: North America > United States > Maryland (0.28)

Genre:

Research Report (0.64)
Personal > Honors (0.46)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Education (1.00)
Government > Military (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)

Add feedback

Learning What To Do by Simulating the Past

Lindner, David, Shah, Rohin, Abbeel, Pieter, Dragan, Anca

arXiv.org Artificial IntelligenceMay-3-2021

Since reward functions are hard to specify, recent work has focused on learning policies from human feedback. However, such approaches are impeded by the expense of acquiring such feedback. Recent work proposed that agents have access to a source of information that is effectively free: in any environment that humans have acted in, the state will already be optimized for human preferences, and thus an agent can extract information about what humans want from the state. Such learning is possible in principle, but requires simulating all possible past trajectories that could have led to the observed state. This is feasible in gridworlds, but how do we scale it to complex tasks? In this work, we show that by combining a learned feature encoder with learned inverse models, we can enable agents to simulate human actions backwards in time to infer what they must have done. The resulting algorithm is able to reproduce a specific skill in MuJoCo environments given a single state sampled from the optimal policy for that skill.

artificial intelligence, deep rlsp, neural network, (17 more...)

arXiv.org Artificial Intelligence

2104.03946

Country:

North America > United States (0.14)
Asia (0.14)

Genre: Research Report (0.64)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)

Add feedback

Choice Set Misspecification in Reward Inference

Freedman, Rachel, Shah, Rohin, Dragan, Anca

arXiv.org Artificial IntelligenceJan-19-2021

Specifying reward functions for robots that operate in environments without a natural reward signal can be challenging, and incorrectly specified rewards can incentivise degenerate or dangerous behavior. A promising alternative to manually specifying reward functions is to enable robots to infer them from human feedback, like demonstrations or corrections. To interpret this feedback, robots treat as approximately optimal a choice the person makes from a choice set, like the set of possible trajectories they could have demonstrated or possible corrections they could have made. In this work, we introduce the idea that the choice set itself might be difficult to specify, and analyze choice set misspecification: what happens as the robot makes incorrect assumptions about the set of choices from which the human selects their feedback. We propose a classification of different kinds of choice set misspecification, and show that these different classes lead to meaningful differences in the inferred reward and resulting performance. While we would normally expect misspecification to hurt, we find that certain kinds of misspecification are neither helpful nor harmful (in expectation). However, in other situations, misspecification can be extremely harmful, leading the robot to believe the opposite of what it should believe. We hope our results will allow for better prediction and response to the effects of misspecification in real-world reward inference.

artificial intelligence, misspecification, neural network, (18 more...)

arXiv.org Artificial Intelligence

2101.07691

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.48)

Add feedback

Inverse Reward Design

Hadfield-Menell, Dylan, Milli, Smitha, Abbeel, Pieter, Russell, Stuart, Dragan, Anca

arXiv.org Artificial IntelligenceOct-7-2020

Autonomous agents optimize the reward function we give them. What they don't know is how hard it is for us to design a reward function that actually captures what we want. When designing the reward, we might think of some specific training scenarios, and make sure that the reward will lead to the right behavior in those scenarios. Inevitably, agents encounter new scenarios (e.g., new types of terrain) where optimizing that same reward may lead to undesired behavior. Our insight is that reward functions are merely observations about what the designer actually wants, and that they should be interpreted in the context in which they were designed. We introduce inverse reward design (IRD) as the problem of inferring the true objective based on the designed reward and the training MDP. We introduce approximate methods for solving IRD problems, and use their solution to plan risk-averse behavior in test MDPs. Empirical results suggest that this approach can help alleviate negative side effects of misspecified reward functions and mitigate reward hacking.

artificial intelligence, machine learning, reward function, (15 more...)

arXiv.org Artificial Intelligence

1711.02827

Country: North America > United States > California > Alameda County > Berkeley (0.14)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

AvE: Assistance via Empowerment

Du, Yuqing, Tiomkin, Stas, Kiciman, Emre, Polani, Daniel, Abbeel, Pieter, Dragan, Anca

arXiv.org Artificial IntelligenceAug-2-2020

One difficulty in using artificial agents for human-assistive applications lies in the challenge of accurately assisting with a person's goal(s). Existing methods tend to rely on inferring the human's goal, which is challenging when there are many potential goals or when the set of candidate goals is difficult to identify. We propose a new paradigm for assistance by instead increasing the human's ability to control their environment, and formalize this approach by augmenting reinforcement learning with human empowerment. This task-agnostic objective preserves the person's autonomy and ability to achieve any eventual state. We test our approach against assistance based on goal inference, highlighting scenarios where our method overcomes failure modes stemming from goal ambiguity or misspecification. As existing methods for estimating empowerment in continuous domains are computationally hard, precluding its use in real time learned assistance, we also propose an efficient empowerment-inspired proxy metric. Using this, we are able to successfully demonstrate our method in a shared autonomy user study for a challenging simulated teleoperation task with human-in-the-loop training.

artificial intelligence, empowerment, neural network, (19 more...)

arXiv.org Artificial Intelligence

2006.14796

Genre:

Questionnaire & Opinion Survey (1.00)
Research Report > Experimental Study (0.94)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.91)

Add feedback