Goto

Collaborating Authors

 Agents


The Role of Bio-Inspired Modularity in General Learning

arXiv.org Artificial Intelligence

One goal of general intelligence is to learn novel information without overwriting prior learning. The utility of learning without forgetting (CF) is twofold: first, the system can return to previously learned tasks after learning something new. In addition, bootstrapping previous knowledge may allow for faster learning of a novel task. Previous approaches to CF and bootstrapping are primarily based on modifying learning in the form of changing weights to tune the model to the current task, overwriting previously tuned weights from previous tasks. However, another critical factor that has been largely overlooked is the initial network topology, or architecture. Here, we argue that the topology of biological brains likely evolved certain features that are designed to achieve this kind of informational conservation. In particular, we consider that the highly conserved property of modularity may offer a solution to weight-update learning methods that adheres to the learning without catastrophic forgetting and bootstrapping constraints. Final considerations are then made on how to combine these two learning objectives in a dynamical, general learning system.


Dimension-Free Rates for Natural Policy Gradient in Multi-Agent Reinforcement Learning

arXiv.org Machine Learning

Cooperative multi-agent reinforcement learning is a decentralized paradigm in sequential decision making where agents distributed over a network iteratively collaborate with neighbors to maximize global (network-wide) notions of rewards. Exact computations typically involve a complexity that scales exponentially with the number of agents. To address this curse of dimensionality, we design a scalable algorithm based on the Natural Policy Gradient framework that uses local information and only requires agents to communicate with neighbors within a certain range. Under standard assumptions on the spatial decay of correlations for the transition dynamics of the underlying Markov process and the localized learning policy, we show that our algorithm converges to the globally optimal policy with a dimension-free statistical and computational complexity, incurring a localization error that does not depend on the number of agents and converges to zero exponentially fast as a function of the range of communication.


Trust Region Policy Optimisation in Multi-Agent Reinforcement Learning

arXiv.org Artificial Intelligence

Trust region methods rigorously enabled reinforcement learning (RL) agents to learn monotonically improving policies, leading to superior performance on a variety of tasks. Unfortunately, when it comes to multi-agent reinforcement learning (MARL), the property of monotonic improvement may not simply apply; this is because agents, even in cooperative games, could have conflicting directions of policy updates. As a result, achieving a guaranteed improvement on the joint policy where each agent acts individually remains an open challenge. In this paper, we extend the theory of trust region learning to MARL. Central to our findings are the multi-agent advantage decomposition lemma and the sequential policy update scheme. Based on these, we develop Heterogeneous-Agent Trust Region Policy Optimisation (HATPRO) and Heterogeneous-Agent Proximal Policy Optimisation (HAPPO) algorithms. Unlike many existing MARL algorithms, HATRPO/HAPPO do not need agents to share parameters, nor do they need any restrictive assumptions on decomposibility of the joint value function. Most importantly, we justify in theory the monotonic improvement property of HATRPO/HAPPO. We evaluate the proposed methods on a series of Multi-Agent MuJoCo and StarCraftII tasks. Results show that HATRPO and HAPPO significantly outperform strong baselines such as IPPO, MAPPO and MADDPG on all tested tasks, therefore establishing a new state of the art.


Individual and Collective Autonomous Development

arXiv.org Artificial Intelligence

The increasing complexity and unpredictability of many ICT scenarios let us envision that future systems will have to dynamically learn how to act and adapt to face evolving situations with little or no a priori knowledge, both at the level of individual components and at the collective level. In other words, such systems should become able to autonomously develop models of themselves and of their environment. Autonomous development includes: learning models of own capabilities; learning how to act purposefully towards the achievement of specific goals; and learning how to act collectively, i.e., accounting for the presence of others. In this paper, we introduce the vision of autonomous development in ICT systems, by framing its key concepts and by illustrating suitable application domains. Then, we overview the many research areas that are contributing or can potentially contribute to the realization of the vision, and identify some key research challenges.


BCS Lovelace Lecture 2021

Oxford Comp Sci

In this talk I will review the development of commercially successful Knowledge Representation and Reasoning (KRR) systems and their genesis in foundational research. I will trace the evolution of KRR systems from logical and algorithmic foundations, through academic prototypes and standardisation to robust and scalable systems that power applications in areas as diverse as search, healthcare, financial services and manufacturing. I will discuss the barriers and milestones encountered along the journey, and lessons learned about the exploitation of research. Multi-agent systems first emerged as a research topic in the late 1980s. A key driver behind the emergence of the field was the idea of building systems that actively worked on behalf of human users in the pursuit of those users' goals.


Amid Skepticism, Biden Vows a New Era of Global Collaboration

The New Yorker

Joe Biden made his début at the elegant green-marble rostrum of the United Nations this week, as the coronavirus infected more than half a million people each day worldwide, as wildfires and floods aggravated by climate change ravaged the Earth, and as the U.S. struggled to prevent a new cold war with China. In lofty language, the President tried to redirect the world's focus away from the calamitous end to America's longest war, in Afghanistan, and a recent bust-up with its most longstanding ally, France. Just eight months into his Presidency, Biden is already trying to hit reset on his foreign policy. "I stand here today for the first time in twenty years with the United States not at war. We've turned the page," Biden told the chamber.


Making Human-Like Trade-offs in Constrained Environments by Learning from Demonstrations

arXiv.org Artificial Intelligence

Many real-life scenarios require humans to make difficult trade-offs: do we always follow all the traffic rules or do we violate the speed limit in an emergency? These scenarios force us to evaluate the trade-off between collective norms and our own personal objectives. To create effective AI-human teams, we must equip AI agents with a model of how humans make trade-offs in complex, constrained environments. These agents will be able to mirror human behavior or to draw human attention to situations where decision making could be improved. To this end, we propose a novel inverse reinforcement learning (IRL) method for learning implicit hard and soft constraints from demonstrations, enabling agents to quickly adapt to new settings. In addition, learning soft constraints over states, actions, and state features allows agents to transfer this knowledge to new domains that share similar aspects. We then use the constraint learning method to implement a novel system architecture that leverages a cognitive model of human decision making, multi-alternative decision field theory (MDFT), to orchestrate competing objectives. We evaluate the resulting agent on trajectory length, number of violated constraints, and total reward, demonstrating that our agent architecture is both general and achieves strong performance. Thus we are able to capture and replicate human-like trade-offs from demonstrations in environments when constraints are not explicit.


Towards Multi-Agent Reinforcement Learning using Quantum Boltzmann Machines

arXiv.org Artificial Intelligence

Reinforcement learning has driven impressive advances in machine learning. Simultaneously, quantum-enhanced machine learning algorithms using quantum annealing underlie heavy developments. Recently, a multi-agent reinforcement learning (MARL) architecture combining both paradigms has been proposed. This novel algorithm, which utilizes Quantum Boltzmann Machines (QBMs) for Q-value approximation has outperformed regular deep reinforcement learning in terms of time-steps needed to converge. However, this algorithm was restricted to single-agent and small 2x2 multi-agent grid domains. In this work, we propose an extension to the original concept in order to solve more challenging problems. Similar to classic DQNs, we add an experience replay buffer and use different networks for approximating the target and policy values. The experimental results show that learning becomes more stable and enables agents to find optimal policies in grid-domains with higher complexity. Additionally, we assess how parameter sharing influences the agents behavior in multi-agent domains. Quantum sampling proves to be a promising method for reinforcement learning tasks, but is currently limited by the QPU size and therefore by the size of the input and Boltzmann machine.


A Socially Aware Reinforcement Learning Agent for The Single Track Road Problem

arXiv.org Artificial Intelligence

We present the single track road problem. In this problem two agents face each-other at opposite positions of a road that can only have one agent pass at a time. We focus on the scenario in which one agent is human, while the other is an autonomous agent. We run experiments with human subjects in a simple grid domain, which simulates the single track road problem. We show that when data is limited, building an accurate human model is very challenging, and that a reinforcement learning agent, which is based on this data, does not perform well in practice. However, we show that an agent that tries to maximize a linear combination of the human's utility and its own utility, achieves a high score, and significantly outperforms other baselines, including an agent that tries to maximize only its own utility. While humans can cope with new situations quite easily, even state-of-the-art algorithms trouble with new situations that they haven't been trained on. Unfortunately, when it comes to autonomous vehicles the results may be devastating. One example for an uncommon, yet important scenario for autonomous vehicles is the problem of a single track road. In this problem two vehicles in opposite directions must cross a narrow road, which is not wide enough to allow both vehicles to pass at the same time.


Locality Matters: A Scalable Value Decomposition Approach for Cooperative Multi-Agent Reinforcement Learning

arXiv.org Machine Learning

Cooperative multi-agent reinforcement learning (MARL) faces significant scalability issues due to state and action spaces that are exponentially large in the number of agents. As environments grow in size, effective credit assignment becomes increasingly harder and often results in infeasible learning times. Still, in many real-world settings, there exist simplified underlying dynamics that can be leveraged for more scalable solutions. In this work, we exploit such locality structures effectively whilst maintaining global cooperation. We propose a novel, value-based multi-agent algorithm called LOMAQ, which incorporates local rewards in the Centralized Training Decentralized Execution paradigm. Additionally, we provide a direct reward decomposition method for finding these local rewards when only a global signal is provided. We test our method empirically, showing it scales well compared to other methods, significantly improving performance and convergence speed.