Collaborating Authors


Core Challenges in Embodied Vision-Language Planning

Journal of Artificial Intelligence Research

Recent advances in the areas of multimodal machine learning and artificial intelligence (AI) have led to the development of challenging tasks at the intersection of Computer Vision, Natural Language Processing, and Embodied AI. Whereas many approaches and previous survey pursuits have characterised one or two of these dimensions, there has not been a holistic analysis at the center of all three. Moreover, even when combinations of these topics are considered, more focus is placed on describing, e.g., current architectural methods, as opposed to also illustrating high-level challenges and opportunities for the field. In this survey paper, we discuss Embodied Vision-Language Planning (EVLP) tasks, a family of prominent embodied navigation and manipulation problems that jointly use computer vision and natural language. We propose a taxonomy to unify these tasks and provide an in-depth analysis and comparison of the new and current algorithmic approaches, metrics, simulated environments, as well as the datasets used for EVLP tasks. Finally, we present the core challenges that we believe new EVLP works should seek to address, and we advocate for task construction that enables model generalizability and furthers real-world deployment.

Emergent bartering behaviour in multi-agent reinforcement learning


Advances in artificial intelligence often stem from the development of new environments that abstract real-world situations into a form where research can be done conveniently. This paper contributes such an environment based on ideas inspired by elementary Microeconomics. Agents learn to produce resources in a spatially complex world, trade them with one another, and consume those that they prefer. We show that the emergent production, consumption, and pricing behaviours respond to environmental conditions in the directions predicted by supply and demand shifts in Microeconomics. We also demonstrate settings where the agents' emergent prices for goods vary over space, reflecting the local abundance of goods.

OpenAI's AutoDIME: Automating Multi-Agent Environment Design for RL Agents


Natural selection driven by interspecific and intraspecific competition is a fundamental evolutionary mechanism that has led to the wide diversity and complexity of species inhabiting Earth. The process is mirrored to a degree in contemporary AI research, where competitive multi-agent reinforcement learning (RL) environments have enabled machines to reach superhuman performance. Designing multi-agent RL environments with conditions conducive to the development of interesting and useful agent skills can however be a time-consuming and laborious process. A common approach in single-agent settings is domain randomization, where the agent is trained on a wide distribution of randomized environments. Recent works have improved this process via automatic environment curricula techniques that adapt environment distribution during training to maximize the number of environments that produce better and more robust skills.

Interpretable pipelines with evolutionarily optimized modules for RL tasks with visual inputs Artificial Intelligence

The importance of explainability in AI has become a pressing concern, for which several explainable AI (XAI) approaches have been recently proposed. However, most of the available XAI techniques are post-hoc methods, which however may be only partially reliable, as they do not reflect exactly the state of the original models. Thus, a more direct way for achieving XAI is through interpretable (also called glass-box) models. These models have been shown to obtain comparable (and, in some cases, better) performance with respect to black-boxes models in various tasks such as classification and reinforcement learning. However, they struggle when working with raw data, especially when the input dimensionality increases and the raw inputs alone do not give valuable insights on the decision-making process. Here, we propose to use end-to-end pipelines composed of multiple interpretable models co-optimized by means of evolutionary algorithms, that allows us to decompose the decision-making process into two parts: computing high-level features from raw data, and reasoning on the extracted high-level features. We test our approach in reinforcement learning environments from the Atari benchmark, where we obtain comparable results (with respect to black-box approaches) in settings without stochastic frame-skipping, while performance degrades in frame-skipping settings.

skrl: Modular and Flexible Library for Reinforcement Learning Artificial Intelligence

skrl is an open-source modular library for reinforcement learning written in Python and designed with a focus on readability, simplicity, and transparency of algorithm implementations. Apart from supporting environments that use the traditional OpenAI Gym interface, it allows loading, configuring, and operating NVIDIA Isaac Gym environments, enabling the parallel training of several agents with adjustable scopes, which may or may not share resources, in the same execution. The library's documentation can be found at and its source code is available on GitHub at url{

Reward-Respecting Subtasks for Model-Based Reinforcement Learning Artificial Intelligence

To achieve the ambitious goals of artificial intelligence, reinforcement learning must include planning with a model of the world that is abstract in state and time. Deep learning has made progress in state abstraction, but, although the theory of time abstraction has been extensively developed based on the options framework, in practice options have rarely been used in planning. One reason for this is that the space of possible options is immense and the methods previously proposed for option discovery do not take into account how the option models will be used in planning. Options are typically discovered by posing subsidiary tasks such as reaching a bottleneck state, or maximizing a sensory signal other than the reward. Each subtask is solved to produce an option, and then a model of the option is learned and made available to the planning process. The subtasks proposed in most previous work ignore the reward on the original problem, whereas we propose subtasks that use the original reward plus a bonus based on a feature of the state at the time the option stops. We show that options and option models obtained from such reward-respecting subtasks are much more likely to be useful in planning and can be learned online and off-policy using existing learning algorithms. Reward respecting subtasks strongly constrain the space of options and thereby also provide a partial solution to the problem of option discovery. Finally, we show how the algorithms for learning values, policies, options, and models can be unified using general value functions.

Backdoor Detection in Reinforcement Learning Artificial Intelligence

While the real world application of reinforcement learning (RL) is becoming popular, the safety concern and the robustness of an RL system require more attention. A recent work reveals that, in a multi-agent RL environment, backdoor trigger actions can be injected into a victim agent (a.k.a. trojan agent), which can result in a catastrophic failure as soon as it sees the backdoor trigger action. We propose the problem of RL Backdoor Detection, aiming to address this safety vulnerability. An interesting observation we drew from extensive empirical studies is a trigger smoothness property where normal actions similar to the backdoor trigger actions can also trigger low performance of the trojan agent. Inspired by this observation, we propose a reinforcement learning solution TrojanSeeker to find approximate trigger actions for the trojan agents, and further propose an efficient approach to mitigate the trojan agents based on machine unlearning. Experiments show that our approach can correctly distinguish and mitigate all the trojan agents across various types of agents and environments.

Graph Convolution-Based Deep Reinforcement Learning for Multi-Agent Decision-Making in Mixed Traffic Environments Artificial Intelligence

An efficient and reliable multi-agent decision-making system is highly demanded for the safe and efficient operation of connected autonomous vehicles in intelligent transportation systems. Current researches mainly focus on the Deep Reinforcement Learning (DRL) methods. However, utilizing DRL methods in interactive traffic scenarios is hard to represent the mutual effects between different vehicles and model the dynamic traffic environments due to the lack of interactive information in the representation of the environments, which results in low accuracy of cooperative decisions generation. To tackle these difficulties, this research proposes a framework to enable different Graph Reinforcement Learning (GRL) methods for decision-making, and compares their performance in interactive driving scenarios. GRL methods combinate the Graph Neural Network (GNN) and DRL to achieve the better decisions generation in interactive scenarios of autonomous vehicles, where the features of interactive scenarios are extracted by the GNN, and cooperative behaviors are generated by DRL framework. Several GRL approaches are summarized and implemented in the proposed framework. To evaluate the performance of the proposed GRL methods, an interactive driving scenarios on highway with two ramps is constructed, and simulated experiment in the SUMO platform is carried out to evaluate the performance of different GRL approaches. Finally, results are analyzed in multiple perspectives and dimensions to compare the characteristic of different GRL approaches in intelligent transportation scenarios. Results show that the implementation of GNN can well represents the interaction between vehicles, and the combination of GNN and DRL is able to improve the performance of the generation of lane-change behaviors. The source code of our work can be found at

Learning to Coordinate with Humans using Action Features Artificial Intelligence

An unaddressed challenge in human-AI coordination is to enable AI agents to exploit the semantic relationships between the features of actions and the features of observations. Humans take advantage of these relationships in highly intuitive ways. For instance, in the absence of a shared language, we might point to the object we desire or hold up our fingers to indicate how many objects we want. To address this challenge, we investigate the effect of network architecture on the propensity of learning algorithms to exploit these semantic relationships. Across a procedurally generated coordination task, we find that attention-based architectures that jointly process a featurized representation of observations and actions have a better inductive bias for zero-shot coordination. Through fine-grained evaluation and scenario analysis, we show that the resulting policies are human-interpretable. Moreover, such agents coordinate with people without training on any human data.

Any-Play: An Intrinsic Augmentation for Zero-Shot Coordination Artificial Intelligence

Cooperative artificial intelligence with human or superhuman proficiency in collaborative tasks stands at the frontier of machine learning research. Prior work has tended to evaluate cooperative AI performance under the restrictive paradigms of self-play (teams composed of agents trained together) and cross-play (teams of agents trained independently but using the same algorithm). Recent work has indicated that AI optimized for these narrow settings may make for undesirable collaborators in the real-world. We formalize an alternative criteria for evaluating cooperative AI, referred to as inter-algorithm cross-play, where agents are evaluated on teaming performance with all other agents within an experiment pool with no assumption of algorithmic similarities between agents. We show that existing state-of-the-art cooperative AI algorithms, such as Other-Play and Off-Belief Learning, under-perform in this paradigm. We propose the Any-Play learning augmentation -- a multi-agent extension of diversity-based intrinsic rewards for zero-shot coordination (ZSC) -- for generalizing self-play-based algorithms to the inter-algorithm cross-play setting. We apply the Any-Play learning augmentation to the Simplified Action Decoder (SAD) and demonstrate state-of-the-art performance in the collaborative card game Hanabi.