Agents
AI is now learning to evolve like earthly lifeforms
This article is part of our reviews of AI research papers, a series of posts that explore the latest findings in artificial intelligence. Hundreds of millions of years of evolution have blessed our planet with a wide variety of lifeforms, each intelligent in its own fashion. Each species has evolved to develop innate skills, learning capacities, and a physical form that ensure its survival in its environment. But despite being inspired by nature and evolution, the field of artificial intelligence has largely focused on creating the elements of intelligence separately and fusing them together after development. While this approach has yielded great results, it has also limited the flexibility of AI agents in some of the basic skills found in even the simplest lifeforms.
Investigation of Independent Reinforcement Learning Algorithms in Multi-Agent Environments
Lee, Ken Ming, Subramanian, Sriram Ganapathi, Crowley, Mark
Independent reinforcement learning algorithms have no theoretical guarantees for finding the best policy in multi-agent settings. However, in practice, prior works have reported good performance with independent algorithms in some domains and bad performance in others. Moreover, a comprehensive study of the strengths and weaknesses of independent algorithms is lacking in the literature. In this paper, we carry out an empirical comparison of the performance of independent algorithms on four PettingZoo environments that span the three main categories of multi-agent environments, i.e., cooperative, competitive, and mixed. We show that in fully-observable environments, independent algorithms can perform on par with multi-agent algorithms in cooperative and competitive settings. For the mixed environments, we show that agents trained via independent algorithms learn to perform well individually, but fail to learn to cooperate with allies and compete with enemies. We also show that adding recurrence improves the learning of independent algorithms in cooperative partially observable environments.
On the Expressivity of Markov Reward
Abel, David, Dabney, Will, Harutyunyan, Anna, Ho, Mark K., Littman, Michael L., Precup, Doina, Singh, Satinder
Reward is the driving force for reinforcement-learning agents. This paper is dedicated to understanding the expressivity of reward as a way to capture tasks that we would want an agent to perform. We frame this study around three new abstract notions of "task" that might be desirable: (1) a set of acceptable behaviors, (2) a partial ordering over behaviors, or (3) a partial ordering over trajectories. Our main results prove that while reward can express many of these tasks, there exist instances of each task type that no Markov reward function can capture. We then provide a set of polynomial-time algorithms that construct a Markov reward function that allows an agent to optimize tasks of each of these three types, and correctly determine when no such reward function exists. We conclude with an empirical study that corroborates and illustrates our theoretical findings.
Federated Learning Using Particle Swarm Optimization
Federated learning is a method that stores only learnt models on a server in order to protect data privacy. This approach does not collect data on the server but instead collects data from scattered clients directly. Due to the fact that federated learning clients frequently have limited transmission bandwidth, communication between servers and clients should be streamlined to maximize performance. As a result, researchers have created the FedPSO algorithm, which combines the particle swarm optimization technique with federated learning to boost network communication performance. We will attempt to cover certain aspects of this system and comprehend the proposed system in this post.
New Deep Reinforcement Learning Technique Helps AI To Evolve - AI Summary
Humans have an upright posture that frees their arms and lets them see the far horizon, hands and nimble fingers that can manipulate objects, and a brain that makes them the best social creatures and problem solvers on the planet. While these approaches speed up the process and reduce the costs of training and evolving AI agents, they also limit the flexibility and variety of results that can be achieved. DERL uses Darwinian evolution to search the morphological space for optimal solutions, which means that when a new generation of AI agents are spawned, they only inherit the physical and architectural traits of their parents (along with slight mutations). "DERL opens the door to performing large-scale in silico experiments to yield scientific insights into how learning and evolution cooperatively create sophisticated relationships between environmental complexity, morphological intelligence, and the learnability of control tasks," the researchers wrote. "Intriguingly, the existence of this morphological Baldwin effect could be exploited in future studies to create embodied agents with lower sample complexity and higher generalization capacity," the researchers wrote.
New deep reinforcement learning technique helps AI to evolve
Hundreds of millions of years of evolution have produced a variety of life-forms, each intelligent in its own fashion. Each species has evolved to develop innate skills, learning capacities, and a physical form that ensures survival in its environment. But despite being inspired by nature and evolution, the field of artificial intelligence has largely focused on creating the elements of intelligence separately and fusing them together after the development process. While this approach has yielded great results, it has also limited the flexibility of AI agents in some of the basic skills found in even the simplest life-forms. In a new paper published in the scientific journal Nature, AI researchers at Stanford University present a new technique that can help take steps toward overcoming some of these limits.
Delayed Propagation Transformer: A Universal Computation Engine towards Practical Control in Cyber-Physical Systems
Zheng, Wenqing, Guo, Qiangqiang, Yang, Hao, Wang, Peihao, Wang, Zhangyang
Multi-agent control is a central theme in the Cyber-Physical Systems (CPS). However, current control methods either receive non-Markovian states due to insufficient sensing and decentralized design, or suffer from poor convergence. This paper presents the Delayed Propagation Transformer (DePT), a new transformer-based model that specializes in the global modeling of CPS while taking into account the immutable constraints from the physical world. DePT induces a cone-shaped spatial-temporal attention prior, which injects the information propagation and aggregation principles and enables a global view. With physical constraint inductive bias baked into its design, our DePT is ready to plug and play for a broad class of multi-agent systems. The experimental results on one of the most challenging CPS -- network-scale traffic signal control system in the open world -- show that our model outperformed the state-of-the-art expert methods on synthetic and real-world datasets. Our codes are released at: https://github.com/VITA-Group/DePT.
Learning to Be Cautious
Mohammedalamen, Montaser, Morrill, Dustin, Sieusahai, Alexander, Satsangi, Yash, Bowling, Michael
A key challenge in the field of reinforcement learning is to develop agents that behave cautiously in novel situations. It is generally impossible to anticipate all situations that an autonomous system may face or what behavior would best avoid bad outcomes. An agent that could learn to be cautious would overcome this challenge by discovering for itself when and how to behave cautiously. In contrast, current approaches typically embed task-specific safety information or explicit cautious behaviors into the system, which is error-prone and imposes extra burdens on practitioners. In this paper, we present both a sequence of tasks where cautious behavior becomes increasingly non-obvious, as well as an algorithm to demonstrate that it is possible for a system to \emph{learn} to be cautious. The essential features of our algorithm are that it characterizes reward function uncertainty without task-specific safety information and uses this uncertainty to construct a robust policy. Specifically, we construct robust policies with a $k$-of-$N$ counterfactual regret minimization (CFR) subroutine given a learned reward function uncertainty represented by a neural network ensemble belief. These policies exhibit caution in each of our tasks without any task-specific safety tuning.
Call for nominations: ACM SIGAI Autonomous Agents Research Award 2022
Nominations are solicited for the 2022 ACM SIGAI Autonomous Agents Research Award. This award is made for excellence in research in the area of autonomous agents. It is intended to recognize researchers in autonomous agents whose current work is an important influence on the field. The award is an official ACM award, funded by an endowment created by ACM SIGAI from the proceeds of previous Autonomous Agents conferences. The recipient of the award will receive a monetary prize and a certificate, and will be invited to present a plenary talk at the AAMAS 2022 conference in Auckland, New Zealand.
Decentralized Feature-Distributed Optimization for Generalized Linear Models
Ancelin, Brighton, Bahmani, Sohail, Romberg, Justin
We consider the "all-for-one" decentralized learning problem for generalized linear models. The features of each sample are partitioned among several collaborating agents in a connected network, but only one agent observes the response variables. To solve the regularized empirical risk minimization in this distributed setting, we apply the Chambolle--Pock primal--dual algorithm to an equivalent saddle-point formulation of the problem. The primal and dual iterations are either in closed-form or reduce to coordinate-wise minimization of scalar convex functions. We establish convergence rates for the empirical risk minimization under two different assumptions on the loss function (Lipschitz and square root Lipschitz), and show how they depend on the characteristics of the design matrix and the Laplacian of the network.