Agents
Why and how to build autonomous systems - AI for Business
Automated control systems were one of the most disruptive applications of industrial technology in the 20th century. The ability to control workflows and processes based on specific inputs and outputs streamlined even the most complex manufacturing processes. These systems, however, need specific parameters and, in some cases, require extensive human oversight and planning to ensure optimal execution. Innovations in AI training methodologies are pushing past these limitations to produce the next wave of disruption to industrial technology: autonomous systems. Autonomous machines do more than address the limitations of automated systems, however.
Differentially-Private Federated Linear Bandits
Dubey, Abhimanyu, Pentland, Alex
The rapid proliferation of decentralized learning systems mandates the need for differentially-private cooperative learning. In this paper, we study this in context of the contextual linear bandit: we consider a collection of agents cooperating to solve a common contextual bandit, while ensuring that their communication remains private. For this problem, we devise \textsc{FedUCB}, a multiagent private algorithm for both centralized and decentralized (peer-to-peer) federated learning. We provide a rigorous technical analysis of its utility in terms of regret, improving several results in cooperative bandit learning, and provide rigorous privacy guarantees as well. Our algorithms provide competitive performance both in terms of pseudoregret bounds and empirical benchmark performance in various multi-agent settings.
Meta-trained agents implement Bayes-optimal agents
Mikulik, Vladimir, Delétang, Grégoire, McGrath, Tom, Genewein, Tim, Martic, Miljan, Legg, Shane, Ortega, Pedro A.
Memory-based meta-learning is a powerful technique to build agents that adapt fast to any task within a target distribution. A previous theoretical study has argued that this remarkable performance is because the meta-training protocol incentivises agents to behave Bayes-optimally. We empirically investigate this claim on a number of prediction and bandit tasks. Inspired by ideas from theoretical computer science, we show that meta-learned and Bayes-optimal agents not only behave alike, but they even share a similar computational structure, in the sense that one agent system can approximately simulate the other. Furthermore, we show that Bayes-optimal agents are fixed points of the meta-learning dynamics. Our results suggest that memory-based meta-learning might serve as a general technique for numerically approximating Bayes-optimal agents - that is, even for task distributions for which we currently don't possess tractable models.
Influence-Augmented Online Planning for Complex Environments
He, Jinke, Suau, Miguel, Oliehoek, Frans A.
How can we plan efficiently in real time to control an agent in a complex environment that may involve many other agents? While existing sample-based planners have enjoyed empirical success in large POMDPs, their performance heavily relies on a fast simulator. However, real-world scenarios are complex in nature and their simulators are often computationally demanding, which severely limits the performance of online planners. In this work, we propose influence-augmented online planning, a principled method to transform a factored simulator of the entire environment into a local simulator that samples only the state variables that are most relevant to the observation and reward of the planning agent and captures the incoming influence from the rest of the environment using machine learning methods. Our main experimental results show that planning on this less accurate but much faster local simulator with POMCP leads to higher real-time planning performance than planning on the simulator that models the entire environment.
A global collaboration to move artificial intelligence principles to practice
The choices that technologists, policymakers, and communities make in the next few years will shape the relationship between machines and humans for decades to come. The rapidly increasing applicability of AI has prompted a number of organizations to develop high-level principles on social and ethical issues such as privacy, fairness, bias, transparency, and accountability. Building on those broader principles, the AI Policy Forum, a global effort convened by the MIT Stephen A. Schwarzman College of Computing, will provide an overarching policy framework and tools for governments and companies to implement in concrete ways. "Our goal is to help policymakers in making practical decisions about AI policy," says Daniel Huttenlocher, dean of the MIT Schwarzman College of Computing. "We are not trying to develop another set of principles around AI, several of which already exist, but rather provide context and guidelines specific to a field of use of AI to help policymakers around the world with implementation." "Moving beyond principles means understanding trade-offs and identifying the technical tools and the policy levers to address them.
Towards Heterogeneous Multi-Agent Reinforcement Learning with Graph Neural Networks
Meneghetti, Douglas De Rizzo, Bianchi, Reinaldo Augusto da Costa
This work proposes a neural network architecture that learns policies for multiple agent classes in a heterogeneous multi-agent reinforcement setting. The proposed network uses directed labeled graph representations for states, encodes feature vectors of different sizes for different entity classes, uses relational graph convolution layers to model different communication channels between entity types and learns distinct policies for different agent classes, sharing parameters wherever possible. Results have shown that specializing the communication channels between entity classes is a promising step to achieve higher performance in environments composed of heterogeneous entities.
Negotiating Team Formation Using Deep Reinforcement Learning
Bachrach, Yoram, Everett, Richard, Hughes, Edward, Lazaridou, Angeliki, Leibo, Joel Z., Lanctot, Marc, Johanson, Michael, Czarnecki, Wojciech M., Graepel, Thore
When autonomous agents interact in the same environment, they must often cooperate to achieve their goals. One way for agents to cooperate effectively is to form a team, make a binding agreement on a joint plan, and execute it. However, when agents are self-interested, the gains from team formation must be allocated appropriately to incentivize agreement. Various approaches for multi-agent negotiation have been proposed, but typically only work for particular negotiation protocols. More general methods usually require human input or domain-specific data, and so do not scale. To address this, we propose a framework for training agents to negotiate and form teams using deep reinforcement learning. Importantly, our method makes no assumptions about the specific negotiation protocol, and is instead completely experience driven. We evaluate our approach on both non-spatial and spatially extended team-formation negotiation environments, demonstrating that our agents beat hand-crafted bots and reach negotiation outcomes consistent with fair solutions predicted by cooperative game theory. Additionally, we investigate how the physical location of agents influences negotiation outcomes.
Multi-Radar Tracking Optimization for Collaborative Combat
Nour, Nouredine, Belhaj-Soullami, Reda, Buron, Cédric, Peres, Alain, Barbaresco, Frédéric
Despite great interest in recent research, in particular in China [1, 2] micromanagement of sensors by centralized command and control drives possible inefficiencies and risk into operations. Tactical decision making and execution by headquarters usually fail to achieve the speed necessary to meet rapid changes. Collaborative radars with C2 must provide decision superiority despite the attempts of an adversary to disrupt OODA cycles at all level of operations. Artificial intelligence can make a contribution for the purposes of coordinated conduct of the action, by improving the response time to threats and optimizing the allocation and the distribution of tasks within elementary smart radars. In order to address this problem, Thales and the private research lab NukkAI have been collaborating to introduce novel approaches for netted radars. Thales provided the simulation modeling the multi-radar target allocation problem and NukkAI proposed two novel reward-based learning approaches for the problem. In this paper, we present these two approaches: Evolutionary Single-Target Ordering (ESTO), which is based on evolution strategies and an RL approach based on Actor-Critic methods. To make the RL method tractable in practice, we introduce a simplification of the problem that we prove to be equivalent to solving the initial formulation. We evaluate our solutions on diverse scenarios of the aforementioned simulation.
A Particle Swarm Inspired Approach for Continuous Distributed Constraint Optimization Problems
Choudhury, Moumita, Sarker, Amit, Khan, Md. Mosaddek, Yeoh, William
Distributed Constraint Optimization Problems (DCOPs) are a widely studied framework for coordinating interactions in cooperative multi-agent systems. In classical DCOPs, variables owned by agents are assumed to be discrete. However, in many applications, such as target tracking or sleep scheduling in sensor networks, continuous-valued variables are more suitable than discrete ones. To better model such applications, researchers have proposed Continuous DCOPs (C-DCOPs), an extension of DCOPs, that can explicitly model problems with continuous variables. The state-of-the-art approaches for solving C-DCOPs experience either onerous memory or computation overhead and unsuitable for non-differentiable optimization problems. To address this issue, we propose a new C-DCOP algorithm, namely Particle Swarm Optimization Based C-DCOP (PCD), which is inspired by Particle Swarm Optimization (PSO), a well-known centralized population-based approach for solving continuous optimization problems. In recent years, population-based algorithms have gained significant attention in classical DCOPs due to their ability in producing high-quality solutions. Nonetheless, to the best of our knowledge, this class of algorithms has not been utilized to solve C-DCOPs and there has been no work evaluating the potential of PSO in solving classical DCOPs or C-DCOPs. In light of this observation, we adapted PSO, a centralized algorithm, to solve C-DCOPs in a decentralized manner. The resulting PCD algorithm not only produces good-quality solutions but also finds solutions without any requirement for derivative calculations. Moreover, we design a crossover operator that can be used by PCD to further improve the quality of solutions found. Finally, we theoretically prove that PCD is an anytime algorithm and empirically evaluate PCD against the state-of-the-art C-DCOP algorithms in a wide variety of benchmarks.
Robust Asynchronous and Network-Independent Cooperative Learning
Mojica-Nava, Eduardo, Yanguas-Rojas, David, Uribe, César A.
We consider the model of cooperative learning via distributed non-Bayesian learning, where a network of agents tries to jointly agree on a hypothesis that best described a sequence of locally available observations. Building upon recently proposed weak communication network models, we propose a robust cooperative learning rule that allows asynchronous communications, message delays, unpredictable message losses, and directed communication among nodes. We show that our proposed learning dynamics guarantee that all agents in the network will have an asymptotic exponential decay of their beliefs on the wrong hypothesis, indicating that the beliefs of all agents will concentrate on the optimal hypotheses. Numerical experiments provide evidence on a number of network setups.