Goto

Collaborating Authors

 Agent Societies


Decomposition Methods with Deep Corrections for Reinforcement Learning

arXiv.org Artificial Intelligence

Decomposition methods have been proposed to approximate solutions to large sequential decision making problems. In contexts where an agent interacts with multiple entities, utility decomposition can be used to separate the global objective into local tasks considering each individual entity independently. An arbitrator is then responsible for combining the individual utilities and selecting an action in real time to solve the global problem. Although these techniques can perform well empirically, they rely on strong assumptions of independence between the local tasks and sacrifice the optimality of the global solution. This paper proposes an approach that improves upon such approximate solutions by learning a correction term represented by a neural network. We demonstrate this approach on a fisheries management problem where multiple boats must coordinate to maximize their catch over time as well as on a pedestrian avoidance problem for autonomous driving. In each problem, decomposition methods can scale to multiple boats or pedestrians by using strategies involving one entity. We verify empirically that the proposed correction method significantly improves the decomposition method and outperforms a policy trained on the full scale problem without utility decomposition.


Learning 3D Navigation Protocols on Touch Interfaces with Cooperative Multi-Agent Reinforcement Learning

arXiv.org Artificial Intelligence

Using touch devices to navigate in virtual 3D environments such as computer assisted design (CAD) models or geographical information systems (GIS) is inherently difficult for humans, as the 3D operations have to be performed by the user on a 2D touch surface. This ill-posed problem is classically solved with a fixed and handcrafted interaction protocol, which must be learned by the user. We propose to automatically learn a new interaction protocol allowing to map a 2D user input to 3D actions in virtual environments using reinforcement learning (RL). A fundamental problem of RL methods is the vast amount of interactions often required, which are difficult to come by when humans are involved. To overcome this limitation, we make use of two collaborative agents. The first agent models the human by learning to perform the 2D finger trajectories. The second agent acts as the interaction protocol, interpreting and translating to 3D operations the 2D finger trajectories from the first agent. We restrict the learned 2D trajectories to be similar to a training set of collected human gestures by first performing state representation learning, prior to reinforcement learning. This state representation learning is addressed by projecting the gestures into a latent space learned by a variational auto encoder (VAE).


Two Body Problem: Collaborative Visual Task Completion

arXiv.org Artificial Intelligence

Collaboration is a necessary skill to perform tasks that are beyond one agent's capabilities. Addressed extensively in both conventional and modern AI, multi-agent collaboration has often been studied in the context of simple grid worlds. We argue that there are inherently visual aspects to collaboration which should be studied in visually rich environments. A key element in collaboration is communication that can be either explicit, through messages, or implicit, through perception of the other agents and the visual world. Learning to collaborate in a visual environment entails learning (1) to perform the task, (2) when and what to communicate, and (3) how to act based on these communications and the perception of the visual world. In this paper we study the problem of learning to collaborate directly from pixels in AI2-THOR and demonstrate the benefits of explicit and implicit modes of communication to perform visual tasks. Refer to our project page for more details: https://prior.allenai.org/projects/two-body-problem


Cooperative Multi-Agent Reinforcement Learning Framework for Scalping Trading

arXiv.org Artificial Intelligence

We explore deep Reinforcement Learning(RL) algorithms for scalping trading and knew that there is no appropriate trading gym and agent examples. Thus we propose gym and agent like Open AI gym in finance. Not only that, we introduce new RL framework based on our hybrid algorithm which leverages between supervised learning and RL algorithm and uses meaningful observations such order book and settlement data from experience watching scalpers trading. That is very crucial information for traders behavior to be decided. To feed these data into our model, we use spatio-temporal convolution layer, called Conv3D for order book data and temporal CNN, called Conv1D for settlement data. Those are preprocessed by episode filter we developed. Agent consists of four sub agents divided to clarify their own goal to make best decision. Also, we adopted value and policy based algorithm to our framework. With these features, we could make agent mimic scalpers as much as possible. In many fields, RL algorithm has already begun to transcend human capabilities in many domains. This approach could be a starting point to beat human in the financial stock market, too and be a good reference for anyone who wants to design RL algorithm in real world domain. Finally, weexperiment our framework and gave you experiment progress.


Distributed Gibbs: A Linear-Space Sampling-Based DCOP Algorithm

Journal of Artificial Intelligence Research

Researchers have used distributed constraint optimization problems (DCOPs) to model various multi-agent coordination and resource allocation problems. Very recently, Ottens et al. proposed a promising new approach to solve DCOPs that is based on confidence bounds via their Distributed UCT (DUCT) sampling-based algorithm. Unfortunately, its memory requirement per agent is exponential in the number of agents in the problem, which prohibits it from scaling up to large problems. Thus, in this article, we introduce two new sampling-based DCOP algorithms called Sequential Distributed Gibbs (SD-Gibbs) and Parallel Distributed Gibbs (PD-Gibbs). Both algorithms have memory requirements per agent that is linear in the number of agents in the problem. Our empirical results show that our algorithms can find solutions that are better than DUCT, run faster than DUCT, and solve some large problems that DUCT failed to solve due to memory limitations.


Deep Multi-Agent Reinforcement Learning with Discrete-Continuous Hybrid Action Spaces

arXiv.org Artificial Intelligence

Deep Reinforcement Learning (DRL) has been applied to address a variety of cooperative multi-agent problems with either discrete action spaces or continuous action spaces. However, to the best of our knowledge, no previous work has ever succeeded in applying DRL to multi-agent problems with discrete-continuous hybrid (or parameterized) action spaces which is very common in practice. Our work fills this gap by proposing two novel algorithms: Deep Multi-Agent Parameterized Q-Networks (Deep MAPQN) and Deep Multi-Agent Hierarchical Hybrid Q-Networks (Deep MAHHQN). We follow the centralized training but decentralized execution paradigm: different levels of communication between different agents are used to facilitate the training process, while each agent executes its policy independently based on local observations during execution. Our empirical results on several challenging tasks (simulated RoboCup Soccer and game Ghost Story) show that both Deep MAPQN and Deep MAHHQN are effective and significantly outperform existing independent deep parameterized Q-learning method.


A Survey on Transfer Learning for Multiagent Reinforcement Learning Systems

Journal of Artificial Intelligence Research

Multiagent Reinforcement Learning (RL) solves complex tasks that require coordination with other agents through autonomous exploration of the environment. However, learning a complex task from scratch is impractical due to the huge sample complexity of RL algorithms. For this reason, reusing knowledge that can come from previous experience or other agents is indispensable to scale up multiagent RL algorithms. This survey provides a unifying view of the literature on knowledge reuse in multiagent RL. We define a taxonomy of solutions for the general knowledge reuse problem, providing a comprehensive discussion of recent progress on knowledge reuse in Multiagent Systems (MAS) and of techniques for knowledge reuse across agents (that may be actuating in a shared environment or not). We aim at encouraging the community to work towards reusing all the knowledge sources available in a MAS. For that, we provide an in-depth discussion of current lines of research and open questions.


Microscopic Traffic Simulation by Cooperative Multi-agent Deep Reinforcement Learning

arXiv.org Artificial Intelligence

Expert human drivers perform actions relying on traffic laws and their previous experience. While traffic laws are easily embedded into an artificial brain, modeling human complex behaviors which come from past experience is a more challenging task. One of these behaviors is the capability of communicating intentions and negotiating the right of way through driving actions, as when a driver is entering a crowded roundabout and observes other cars movements to guess the best time to merge in. In addition, each driver has its own unique driving style, which is conditioned by both its personal characteristics, such as age and quality of sight, and external factors, such as being late or in a bad mood. For these reasons, the interaction between different drivers is not trivial to simulate in a realistic manner. In this paper, this problem is addressed by developing a microscopic simulator using a Deep Reinforcement Learning Algorithm based on a combination of visual frames, representing the perception around the vehicle, and a vector of numerical parameters. In particular, the algorithm called Asynchronous Advantage Actor-Critic has been extended to a multi-agent scenario in which every agent needs to learn to interact with other similar agents. Moreover, the model includes a novel architecture such that the driving style of each vehicle is adjustable by tuning some of its input parameters, permitting to simulate drivers with different levels of aggressiveness and desired cruising speeds.


Modeling Social Group Communication with Multi-Agent Imitation Learning

arXiv.org Artificial Intelligence

In crowded social scenarios with a myriad of external stimuli, human brains exhibit a natural ability to filter out irrelevant information and narrowly focus their attention. In the midst of multiple groups of people, humans use such sensory gating to effectively further their own group's interactional goals. In this work, we consider the design of a policy network to model multi-group multi-person communication. Our policy takes as input the state of the world such as an agent's gaze direction, body pose of other agents or history of past actions, and outputs an optimal action such as speaking, listening or responding (communication modes). Inspired by humans' natural neurobiological filtering process, a central component of our policy network design is an information gating function, termed the Kinesic-Proxemic-Message Gate (KPM-Gate), that models the ability of an agent to selectively gather information from specific neighboring agents. The degree of influence of a neighbor is based on dynamic non-verbal cues such as body motion, head pose (kinesics) and interpersonal space (proxemics). We further show that the KPM-Gate can be used to discover social groups using its natural interpretation as a social attention mechanism. We pose the communication policy learning problem as a multi-agent imitation learning problem. We learn a single policy shared by all agents under the assumption of a decentralized Markov decision process. We term our policy network as the Multi-Agent Group Discovery and Communication Mode Network (MAGDAM network), as it learns social group structure in addition to the dynamics of group communication. Our experimental validation on both synthetic and real world data shows that our model is able to both discover social group structure and learn an accurate multi-agent communication policy.


A Cooperative Multi-Agent Reinforcement Learning Framework for Resource Balancing in Complex Logistics Network

arXiv.org Artificial Intelligence

Resource balancing within complex transportation networks is one of the most important problems in real logistics domain. Traditional solutions on these problems leverage combinatorial optimization with demand and supply forecasting. However, the high complexity of transportation routes, severe uncertainty of future demand and supply, together with non-convex business constraints make it extremely challenging in the traditional resource management field. In this paper, we propose a novel sophisticated multi-agent reinforcement learning approach to address these challenges. In particular, inspired by the externalities especially the interactions among resource agents, we introduce an innovative cooperative mechanism for state and reward design resulting in more effective and efficient transportation. Extensive experiments on a simulated ocean transportation service demonstrate that our new approach can stimulate cooperation among agents and lead to much better performance. Compared with traditional solutions based on combinatorial optimization, our approach can give rise to a significant improvement in terms of both performance and stability.