Collaborating Authors

RODE: Learning Roles to Decompose Multi-Agent Tasks Machine Learning

Role-based learning holds the promise of achieving scalable multi-agent learning by decomposing complex tasks using roles. However, it is largely unclear how to efficiently discover such a set of roles. To solve this problem, we propose to first decompose joint action spaces into restricted role action spaces by clustering actions according to their effects on the environment and other agents. Learning a role selector based on action effects makes role discovery much easier because it forms a bi-level learning hierarchy -- the role selector searches in a smaller role space and at a lower temporal resolution, while role policies learn in significantly reduced primitive action-observation spaces. We further integrate information about action effects into the role policies to boost learning efficiency and policy generalization. By virtue of these advances, our method (1) outperforms the current state-of-the-art MARL algorithms on 10 of the 14 scenarios that comprise the challenging StarCraft II micromanagement benchmark and (2) achieves rapid transfer to new environments with three times the number of agents. Demonstrative videos are available at .

The Natural Language of Actions Artificial Intelligence

We introduce Act2Vec, a general framework for learning context-based action representation for Reinforcement Learning. Representing actions in a vector space help reinforcement learning algorithms achieve better performance by grouping similar actions and utilizing relations between different actions. We show how prior knowledge of an environment can be extracted from demonstrations and injected into action vector representations that encode natural compatible behavior. We then use these for augmenting state representations as well as improving function approximation of Q-values. We visualize and test action embeddings in three domains including a drawing task, a high dimensional navigation task, and the large action space domain of StarCraft II.

Low-Dimensional State and Action Representation Learning with MDP Homomorphism Metrics Artificial Intelligence

In the last decade, Deep Reinforcement Learning [1] algorithms have solved increasingly complicated problems in many different domains, spanning from video games [2] to numerous robotics applications [3], in an end-to-end fashion. Despite the success of end-to-end Reinforcement Learning, these methods suffer from low sample efficiency and usually requires lengthy and expensive training procedures to learn optimal behaviours. This problem is even more emphasized when the true state of the environment is not observable, and the observation space O or the action space A are high-dimensional. In end-to-end settings, due to the weak supervision of the reward signal, Reinforcement Learning algorithms are not enforced to learn good state representations of the environment, making the mapping observations to actions challenging to learn and interpret. State representation learning [4] methods aim at reducing the dimensionality of the observation stream by learning a mapping from the observation space O to a lower-dimensional state space S containing only the meaningful feature needed for solving a given task. By employing self-supervised auxiliary losses, it is possible to enforce optimal state representation and learn models of the underlying Markov Decision Process, or MDP. When policies are learned using the abstract or latent state-space variables, the training time is often reduced, the sample-efficiency, the robustness, and generalisation capabilities of the policies grow compared to end-to-end Reinforcement Learning [5], [6] and [7]. While the problem of state representation and observation compression has been extensively treated [4], only a few works have extended the concept of dimensionality reduction to the action space A. In this category, we find the works done in [8], [9] and [10] where low-dimensional action representations are used to improve training efficiency

Dynamics-aware Embeddings Artificial Intelligence

In this paper we consider self-supervised representation learning to improve sample efficiency in reinforcement learning (RL). We propose a forward prediction objective for simultaneously learning embeddings of states and actions. These embeddings capture the structure of the environment's dynamics, enabling efficient policy learning. We demonstrate that our action embeddings alone improve the sample efficiency and peak performance of model-free RL on control from low-dimensional states. By combining state and action embeddings, we achieve efficient learning of high-quality policies on goal-conditioned continuous control from pixel observations in only 1-2 million environment steps.

Generalization to New Actions in Reinforcement Learning Artificial Intelligence

A fundamental trait of intelligence is the ability to achieve goals in the face of novel circumstances, such as making decisions from new action choices. However, standard reinforcement learning assumes a fixed set of actions and requires expensive retraining when given a new action set. To make learning agents more adaptable, we introduce the problem of zero-shot generalization to new actions. We propose a two-stage framework where the agent first infers action representations from action information acquired separately from the task. A policy flexible to varying action sets is then trained with generalization objectives. We benchmark generalization on sequential tasks, such as selecting from an unseen tool-set to solve physical reasoning puzzles and stacking towers with novel 3D shapes. Videos and code are available at