Goto

Collaborating Authors

 Agents


Influencing Towards Stable Multi-Agent Interactions

arXiv.org Artificial Intelligence

Learning in multi-agent environments is difficult due to the non-stationarity introduced by an opponent's or partner's changing behaviors. Instead of reactively adapting to the other agent's (opponent or partner) behavior, we propose an algorithm to proactively influence the other agent's strategy to stabilize -- which can restrain the non-stationarity caused by the other agent. We learn a low-dimensional latent representation of the other agent's strategy and the dynamics of how the latent strategy evolves with respect to our robot's behavior. With this learned dynamics model, we can define an unsupervised stability reward to train our robot to deliberately influence the other agent to stabilize towards a single strategy. We demonstrate the effectiveness of stabilizing in improving efficiency of maximizing the task reward in a variety of simulated environments, including autonomous driving, emergent communication, and robotic manipulation. We show qualitative results on our website: https://sites.google.com/view/stable-marl/.


Decentralized Cooperative Lane Changing at Freeway Weaving Areas Using Multi-Agent Deep Reinforcement Learning

arXiv.org Artificial Intelligence

Frequent lane changes during congestion at freeway bottlenecks such as merge and weaving areas further reduce roadway capacity. The emergence of deep reinforcement learning (RL) and connected and automated vehicle technology provides a possible solution to improve mobility and energy efficiency at freeway bottlenecks through cooperative lane changing. Deep RL is a collection of machine-learning methods that enables an agent to improve its performance by learning from the environment. In this study, a decentralized cooperative lane-changing controller was developed using proximal policy optimization by adopting a multi-agent deep RL paradigm. In the decentralized control strategy, policy learning and action reward are evaluated locally, with each agent (vehicle) getting access to global state information. Multi-agent deep RL requires lower computational resources and is more scalable than single-agent deep RL, making it a powerful tool for time-sensitive applications such as cooperative lane changing. The results of this study show that cooperative lane changing enabled by multi-agent deep RL yields superior performance to human drivers in term of traffic throughput, vehicle speed, number of stops per vehicle, vehicle fuel efficiency, and emissions. The trained RL policy is transferable and can be generalized to uncongested, moderately congested, and extremely congested traffic conditions.


Efficient Multi-agent Epistemic Planning: Teaching Planners About Nested Belief

arXiv.org Artificial Intelligence

In the absence of prescribed coordination, it is often necessary for individual agents to synthesize their own plans, taking into account not only their own capabilities and beliefs about the world but also their beliefs about other agents, including what each of the agents will come to believe as the consequence of the actions of others. To illustrate, consider the scenario where Larry and Moe meet on a regular basis at the local diner to swap the latest gossip. Larry has come to know that Nancy (Larry's daughter) has just received a major promotion in her job, but unbeknownst to him, Moe has already learned this bit of information through the grapevine. Before they speak, both believe Nancy is getting a promotion, Larry believes Moe is unaware of this (and consequently wishes to share the news), and Moe assumes Larry must already be aware of the promotion but is unaware of Moe's own knowledge of the situation. Very quickly we can see how the nesting of (potentially incorrect) belief can be a complicated and interesting setting to model. In this paper, we examine the problem of synthesizing plans in such settings. In particular, given a finite set of agents, each with: (1) (possibly incomplete and incorrect) beliefs about the world and about the beliefs of other agents; and (2) differing capabilities including the ability to perform actions whose outcomes are unknown to other agents; we are interested in synthesizing a plan to achieve a goal condition. Planning is at the belief level and as such, while we consider the execution of actions that can change the state of the world (ontic actions) as well as an agent's state of knowledge or belief (epistemic or more accurately doxastic actions, including communication actions), all outcomes are with respect to belief.


Replay-Guided Adversarial Environment Design

arXiv.org Artificial Intelligence

Deep reinforcement learning (RL) agents may successfully generalize to new settings if trained on an appropriately diverse set of environment and task configurations. Unsupervised Environment Design (UED) is a promising self-supervised RL paradigm, wherein the free parameters of an underspecified environment are automatically adapted during training to the agent's capabilities, leading to the emergence of diverse training environments. Here, we cast Prioritized Level Replay (PLR), an empirically successful but theoretically unmotivated method that selectively samples randomly-generated training levels, as UED. We argue that by curating completely random levels, PLR, too, can generate novel and complex levels for effective training. This insight reveals a natural class of UED methods we call Dual Curriculum Design (DCD). Crucially, DCD includes both PLR and a popular UED algorithm, PAIRED, as special cases and inherits similar theoretical guarantees. This connection allows us to develop novel theory for PLR, providing a version with a robustness guarantee at Nash equilibria. Furthermore, our theory suggests a highly counterintuitive improvement to PLR: by stopping the agent from updating its policy on uncurated levels (training on less data), we can improve the convergence to Nash equilibria. Indeed, our experiments confirm that our new method, PLR$^{\perp}$, obtains better results on a suite of out-of-distribution, zero-shot transfer tasks, in addition to demonstrating that PLR$^{\perp}$ improves the performance of PAIRED, from which it inherited its theoretical framework.


Thinking Fast and Slow in AI: the Role of Metacognition

arXiv.org Artificial Intelligence

AI systems have seen dramatic advancement in recent years, bringing many applications that pervade our everyday life. However, we are still mostly seeing instances of narrow AI: many of these recent developments are typically focused on a very limited set of competencies and goals, e.g., image interpretation, natural language processing, classification, prediction, and many others. Moreover, while these successes can be accredited to improved algorithms and techniques, they are also tightly linked to the availability of huge datasets and computational power. State-of-the-art AI still lacks many capabilities that would naturally be included in a notion of (human) intelligence. We argue that a better study of the mechanisms that allow humans to have these capabilities can help us understand how to imbue AI systems with these competencies. We focus especially on D. Kahneman's theory of thinking fast and slow, and we propose a multi-agent AI architecture where incoming problems are solved by either system 1 (or "fast") agents, that react by exploiting only past experience, or by system 2 (or "slow") agents, that are deliberately activated when there is the need to reason and search for optimal solutions beyond what is expected from the system 1 agent. Both kinds of agents are supported by a model of the world, containing domain knowledge about the environment, and a model of "self", containing information about past actions of the system and solvers' skills.


Sim and Real: Better Together

arXiv.org Machine Learning

Simulation is used extensively in autonomous systems, particularly in robotic manipulation. By far, the most common approach is to train a controller in simulation, and then use it as an initial starting point for the real system. We demonstrate how to learn simultaneously from both simulation and interaction with the real environment. We propose an algorithm for balancing the large number of samples from the high throughput but less accurate simulation and the low-throughput, high-fidelity and costly samples from the real environment. We achieve that by maintaining a replay buffer for each environment the agent interacts with. We analyze such multi-environment interaction theoretically, and provide convergence properties, through a novel theoretical replay buffer analysis. We demonstrate the efficacy of our method on a sim-to-real environment.


A Gentle Introduction to Particle Swarm Optimization

#artificialintelligence

Particle swarm optimization (PSO) is one of the bio-inspired algorithms and it is a simple one to search for an optimal solution in the solution space. It is different from other optimization algorithms in such a way that only the objective function is needed and it is not dependent on the gradient or any differential form of the objective. It also has very few hyperparameters. In this tutorial, you will learn the rationale of PSO and its algorithm with an example. Particle Swarm Optimization was proposed by Kennedy and Eberhart in 1995.


Emergence of Theory of Mind Collaboration in Multiagent Systems

#artificialintelligence

Attempts to integrate ToM in opponent modeling has profound cognitive science origin de2014theory; baker2017rational. LOLA in foerster2018learning learns the best response to evolving opponents. Yet, opponents/partners' real-time believes are not considered into policy. However, I-POMDP requires extensive sampling to approximate the nested integration over the belief space, action space and observation space, limiting its scalability. The Bayesian action decoder (BAD)-MDP proposed by Foerster et al.


Human vs. Artificial Intelligence in Autonomous Systems

#artificialintelligence

A common goal, as we see in many articles on AI (artificial intelligence) and ML (machine learning), is to make AI/ML systems more like humans. Some claim that humans are much better at driving road vehicles than self-driving software, even though the accident statistics appear to contradict this view. Perhaps we have it backwards! Maybe the goal should be to make humans as smart as machines. Two recent articles in The New York Times suggest that the latter approach could possibly be better.


Multi-Agent Path Planning Using Deep Reinforcement Learning

arXiv.org Artificial Intelligence

In this paper a deep reinforcement based multi-agent path planning approach is introduced. The experiments are realized in a simulation environment and in this environment different multi-agent path planning problems are produced. The produced problems are actually similar to a vehicle routing problem and they are solved using multi-agent deep reinforcement learning. In the simulation environment, the model is trained on different consecutive problems in this way and, as the time passes, it is observed that the model's performance to solve a problem increases. Always the same simulation environment is used and only the location of target points for the agents to visit is changed. This contributes the model to learn its environment and the right attitude against a problem as the episodes pass. At the end, a model who has already learned a lot to solve a path planning or routing problem in this environment is obtained and this model can already find a nice and instant solution to a given unseen problem even without any training. In routing problems, standard mathematical modeling or heuristics seem to suffer from high computational time to find the solution and it is also difficult and critical to find an instant solution. In this paper a new solution method against these points is proposed and its efficiency is proven experimentally.