Goto

Collaborating Authors

 Agents


Mean-Field Approximation of Cooperative Constrained Multi-Agent Reinforcement Learning (CMARL)

arXiv.org Artificial Intelligence

Mean-Field Control (MFC) has recently been proven to be a scalable tool to approximately solve large-scale multi-agent reinforcement learning (MARL) problems. However, these studies are typically limited to unconstrained cumulative reward maximization framework. In this paper, we show that one can use the MFC approach to approximate the MARL problem even in the presence of constraints. Specifically, we prove that, an $N$-agent constrained MARL problem, with state, and action spaces of each individual agents being of sizes $|\mathcal{X}|$, and $|\mathcal{U}|$ respectively, can be approximated by an associated constrained MFC problem with an error, $e\triangleq \mathcal{O}\left([\sqrt{|\mathcal{X}|}+\sqrt{|\mathcal{U}|}]/\sqrt{N}\right)$. In a special case where the reward, cost, and state transition functions are independent of the action distribution of the population, we prove that the error can be improved to $e=\mathcal{O}(\sqrt{|\mathcal{X}|}/\sqrt{N})$. Also, we provide a Natural Policy Gradient based algorithm and prove that it can solve the constrained MARL problem within an error of $\mathcal{O}(e)$ with a sample complexity of $\mathcal{O}(e^{-6})$.


The Alberta Plan: Sutton's Research Vision for Artificial Intelligence

#artificialintelligence

For anyone familiar with Reinforcement Learning, it is hard not to know who Richard Sutton is. The Sutton & Barto textbook is considered canonical in the field. I always find it highly inspirational to study the views of genuine thought leaders. Thus, when they present a new research vision, I'm primed to listen. This summer, Sutton and his colleagues Bowling and Pilarski outlined a research vision for Artificial Intelligence, designing a blueprint for their research commitments in the next 5 to 10 years. The full document is only 13 pages long and comprehensively written, so it doesn't hurt to have a look.


Data-Efficient Collaborative Decentralized Thermal-Inertial Odometry

arXiv.org Artificial Intelligence

We propose a system solution to achieve data-efficient, decentralized state estimation for a team of flying robots using thermal images and inertial measurements. Each robot can fly independently, and exchange data when possible to refine its state estimate. Our system front-end applies an online photometric calibration to refine the thermal images so as to enhance feature tracking and place recognition. Our system back-end uses a covariance-intersection fusion strategy to neglect the cross-correlation between agents so as to lower memory usage and computational cost. The communication pipeline uses Vector of Locally Aggregated Descriptors (VLAD) to construct a request-response policy that requires low bandwidth usage. We test our collaborative method on both synthetic and real-world data. Our results show that the proposed method improves by up to 46 % trajectory estimation with respect to an individual-agent approach, while reducing up to 89 % the communication exchange. Datasets and code are released to the public, extending the already-public JPL xVIO library.


An ensemble Multi-Agent System for non-linear classification

arXiv.org Artificial Intelligence

Because of this non-linearity, their resolution requires more complex models often called "black boxes" because of their low explicability. In our research project, we aim to design a method to predict mobility information such as users' transport mode in real time from heterogeneous data (e.g., mobile phone data, smartphone sensors, etc.). This method must adapt quickly in a dynamic system where new transport modes and perturbations (e.g., changes in speed limits, COVID-19, etc.) may appear. Bringing up ever larger data streams requires the adoption of online learning techniques in which the model is updated with each new labeled point. Machine learning on dynamic systems (i.e., in which the behavior of individuals, the available sensors and the classes can evolve continuously) is one of the main motivations behind the design of Multi-Agent Systems (MAS). Recent approaches propose to transform a machine learning problem into a problem of cooperation between agents in order to reduce its complexity and to allow the system to adapt to the evolutions of the individuals (Capera et al., 2003). In this paper, we propose to use this collaborative approach to design an algorithm capable of solving supervised classification problems, some of which are non-linear, using linear classification models embedded in a multi-agent structure.


Exploring Task-oriented Communication in Multi-agent System: A Deep Reinforcement Learning Approach

arXiv.org Artificial Intelligence

The multi-agent system (MAS) enables the sharing of capabilities among agents, such that collaborative tasks can be accomplished with high scalability and efficiency. MAS is increasingly widely applied in various fields. Meanwhile, the large-scale and time-sensitive data transmission between agents brings challenges to the communication system. The traditional wireless communication ignores the content of the data and its impact on the task execution at the receiver, which makes it difficult to guarantee the timeliness and relevance of the information. This limitation leads to that traditional wireless communication struggles to effectively support emerging multi-agent collaborative applications. Faced with this dilemma, task-oriented communication is a potential solution, which aims to transmit task-relevant information to improve task execution performance. However, multi-agent collaboration itself is a complex class of sequential decision problems. It is challenging to explore efficient information flow in this context. In this article, we use deep reinforcement learning (DRL) to explore task-oriented communication in MAS. We begin with a discussion on the application of DRL to task-oriented communication. We then envision a task-oriented communication architecture for MAS, and discuss the designs based on DRL. Finally, we discuss open problems for future research and conclude this article.


Collective Adaptation in Multi-Agent Systems: How Predator Confusion Shapes Swarm-Like Behaviors

arXiv.org Artificial Intelligence

Popular hypotheses about the origins of collective adaptation are related to two basic behaviours: protection from predators and a combined search for food resources. Among the anti-predator explanations, the predator confusion hypothesis suggests that groups of individuals moving in a swarm aim to overwhelm the predator while the dilution of risk hypothesis suggests that the probability of a single prey being targeted by a predator is lower in larger groups. In this paper, we explore how emergent behaviors arise from a predator-driven process as an adaptive response to external stimuli perceived as threatening. Moreover, we suggest a predator confusion process to provide a selective pressure for the prey to evolve group formations. We analyze the foraging and prey-predator dynamics evolved in terms of group density and formation, behavior consistency, predator evasion and success rate, and foraging rate. Two agents' perceptual models are compared. A local observation model, where agents can only see what's in their immediate vicinity, and a global observation model, where agents are able to see the predator at all times. Both models were evolved for predator avoidance, foraging and collision avoidance, using reinforcement learning in a simulated game environment. Our results suggest that the dilution of risk factor is sufficient to evolve group formations, and the predator confusion effect could play an important role in the evolution of collaborative behaviors. Finally, we show how variations in the information exchange of this social order can impact the global collective behaviors.


Cooperation and Competition: Flocking with Evolutionary Multi-Agent Reinforcement Learning

arXiv.org Artificial Intelligence

Flocking is a very challenging problem in a multi-agent system; traditional flocking methods also require complete knowledge of the environment and a precise model for control. In this paper, we propose Evolutionary Multi-Agent Reinforcement Learning (EMARL) in flocking tasks, a hybrid algorithm that combines cooperation and competition with little prior knowledge. As for cooperation, we design the agents' reward for flocking tasks according to the boids model. While for competition, agents with high fitness are designed as senior agents, and those with low fitness are designed as junior, letting junior agents inherit the parameters of senior agents stochastically. To intensify competition, we also design an evolutionary selection mechanism that shows effectiveness on credit assignment in flocking tasks. Experimental results in a range of challenging and self-contrast benchmarks demonstrate that EMARL significantly outperforms the full competition or cooperation methods.



Robot law: Public policy, legal liability, and the new world of autonomous systems

#artificialintelligence

Algorithmic disgorgement might sound like a phrase from a science-fiction horror film. In fact, it's a new tool for regulators to address the consequences of autonomous systems, ordering companies to remove or destroy algorithms and models in their products based on data obtained unfairly or deceptively. This is one of topics and papers to be presented and discussed at We Robot, an annual conference where scholars and technologists discuss legal and policy questions relating to robots and artificial intelligence. We Robot is taking place next week, from Sept. 14-16, at the University of Washington in Seattle, with a virtual option, as well. It's also an example of how the legal and regulatory landscape for robots, AI, and autonomous systems have changed in the decade since the conference was first held at the University of Miami in 2012. "We've come very far," said Ryan Calo, one of the organizers of the conference, a University of Washington law professor who specializes in areas including privacy, artificial intelligence and robots.


Skip Training for Multi-Agent Reinforcement Learning Controller for Industrial Wave Energy Converters

arXiv.org Artificial Intelligence

Recent Wave Energy Converters (WEC) are equipped with multiple legs and generators to maximize energy generation. Traditional controllers have shown limitations to capture complex wave patterns and the controllers must efficiently maximize the energy capture. This paper introduces a Multi-Agent Reinforcement Learning controller (MARL), which outperforms the traditionally used spring damper controller. Our initial studies show that the complex nature of problems makes it hard for training to converge. Hence, we propose a novel skip training approach which enables the MARL training to overcome performance saturation and converge to more optimum controllers compared to default MARL training, boosting power generation. We also present another novel hybrid training initialization (STHTI) approach, where the individual agents of the MARL controllers can be initially trained against the baseline Spring Damper (SD) controller individually and then be trained one agent at a time or all together in future iterations to accelerate convergence. We achieved double-digit gains in energy efficiency over the baseline Spring Damper controller with the proposed MARL controllers using the Asynchronous Advantage Actor-Critic (A3C) algorithm.