Agents
SIERRA: A Modular Framework for Research Automation and Reproducibility
Modern intelligent systems researchers form hypotheses about system behavior and then run experiments using one or more independent variables to test their hypotheses. We present SIERRA, a novel framework structured around that idea for accelerating research development and improving reproducibility of results. SIERRA accelerates research by automating the process of generating executable experiments from queries over independent variables(s), executing experiments, and processing the results to generate deliverables such as graphs and videos. It shifts the paradigm for testing hypotheses from procedural ("Do these steps to answer the query") to declarative ("Here is the query to test--GO!"), reducing the burden on researchers. It employs a modular architecture enabling easy customization and extension for the needs of individual researchers, thereby eliminating manual configuration and processing via throw-away scripts. SIERRA improves reproducibility of research by providing automation independent of the execution environment (HPC hardware, real robots, etc.) and targeted platform (arbitrary simulator or real robots). This enables exact experiment replication, up to the limit of the execution environment and platform, as well as making it easy for researchers to test hypotheses in different computational environments.
Why do policy gradient methods work so well in cooperative MARL? Evidence from policy representation
In cooperative multi-agent reinforcement learning (MARL), due to its on-policy nature, policy gradient (PG) methods are typically believed to be less sample efficient than value decomposition (VD) methods, which are off-policy. However, some recent empirical studies demonstrate that with proper input representation and hyper-parameter tuning, multi-agent PG can achieve surprisingly strong performance compared to off-policy VD methods. Why could PG methods work so well? In this post, we will present concrete analysis to show that in certain scenarios, e.g., environments with a highly multi-modal reward landscape, VD can be problematic and lead to undesired outcomes. In addition, PG methods with auto-regressive (AR) policies can learn multi-modal policies.
Acoustic Power Management by Swarms of Microscopic Robots
Microscopic robots in the body could harvest energy from ultrasound to provide on-board control of autonomous behaviors such as measuring and communicating diagnostic information and precisely delivering drugs. This paper evaluates the acoustic power available to micron-size robots that collect energy using pistons. Acoustic attenuation and viscous drag on the pistons are the major limitations on the available power. Frequencies around 100kHz can deliver hundreds of picowatts to a robot in low-attenuation tissue within about 10cm of transducers on the skin, but much less in high-attenuation tissue such as a lung. However, applications of microscopic robots could involve such large numbers that the robots significantly increase attenuation, thereby reducing power for robots deep in the body. This paper describes how robots can collectively manage where and when they harvest energy to mitigate this attenuation so that a swarm of a few hundred billion robots can provide tens of picowatts to each robot, on average.
Multirotor Planning in Dynamic Environments using Temporal Safe Corridors
Toumieh, Charbel, Lambert, Alain
In this paper, we propose a new method for multirotor planning in dynamic environments. The environment is represented as a temporal occupancy grid which gives the current as well as the future/predicted state of all the obstacles. The method builds on previous works in Safe Corridor generation and multirotor planning to avoid moving and static obstacles. It first generates a global path to the goal that doesn't take into account the dynamic aspect of the environment. We then use temporal Safe Corridors to generate safe spaces that the robot can be in at discrete instants in the future. Finally we use the temporal Safe Corridors in an optimization formulation that accounts for the multirotor dynamics as well as all the obstacles to generate the trajectory that will be executed by the multirotor's controller. We show the performance of our method in simulations.
Transformer-based Value Function Decomposition for Cooperative Multi-agent Reinforcement Learning in StarCraft
Khan, Muhammad Junaid, Ahmed, Syed Hammad, Sukthankar, Gita
The StarCraft II Multi-Agent Challenge (SMAC) was created to be a challenging benchmark problem for cooperative multi-agent reinforcement learning (MARL). SMAC focuses exclusively on the problem of StarCraft micromanagement and assumes that each unit is controlled individually by a learning agent that acts independently and only possesses local information; centralized training is assumed to occur with decentralized execution (CTDE). To perform well in SMAC, MARL algorithms must handle the dual problems of multi-agent credit assignment and joint action evaluation. This paper introduces a new architecture TransMix, a transformer-based joint action-value mixing network which we show to be efficient and scalable as compared to the other state-of-the-art cooperative MARL solutions. TransMix leverages the ability of transformers to learn a richer mixing function for combining the agents' individual value functions. It achieves comparable performance to previous work on easy SMAC scenarios and outperforms other techniques on hard scenarios, as well as scenarios that are corrupted with Gaussian noise to simulate fog of war.
MACE: Multi-Agent Autonomous Collaborative Exploration of Unknown Environments
Toumieh, Charbel, Lambert, Alain
In this paper, we propose a new framework for multi-agent collaborative exploration of unknown environments. The proposed method combines state-of-the-art algorithms in mapping, safe corridor generation and multi-agent planning. It first takes a volume that we want to explore, then proceeds to give the multiple agents different goals in order to explore a voxel grid of that volume. The exploration ends when all voxels are discovered as free or occupied, or there is no path found for the remaining undiscovered voxels. The state-of-the-art planning algorithm uses time-aware Safe Corridors to guarantee intra-agent collision safety as well safety from static obstacles. The presented approach is tested in a state of the art simulator for up to 4 agents.
AI for Global Climate Cooperation: Modeling Global Climate Negotiations, Agreements, and Long-Term Cooperation in RICE-N
Zhang, Tianyu, Williams, Andrew, Phade, Soham, Srinivasa, Sunil, Zhang, Yang, Gupta, Prateek, Bengio, Yoshua, Zheng, Stephan
Comprehensive global cooperation is essential to limit global temperature increases while continuing economic development, e.g., reducing severe inequality or achieving long-term economic growth. Achieving long-term cooperation on climate change mitigation with n strategic agents poses a complex game-theoretic problem. For example, agents may negotiate and reach climate agreements, but there is no central authority to enforce adherence to those agreements. Hence, it is critical to design negotiation and agreement frameworks that foster cooperation, allow all agents to meet their individual policy objectives, and incentivize long-term adherence. This is an interdisciplinary challenge that calls for collaboration between researchers in machine learning, economics, climate science, law, policy, ethics, and other fields. In particular, we argue that machine learning is a critical tool to address the complexity of this domain. To facilitate this research, here we introduce RICE-N, a multi-region integrated assessment model that simulates the global climate and economy, and which can be used to design and evaluate the strategic outcomes for different negotiation and agreement frameworks. We also describe how to use multi-agent reinforcement learning to train rational agents using RICE-N. This framework underpinsAI for Global Climate Cooperation, a working group collaboration and competition on climate negotiation and agreement design. Here, we invite the scientific community to design and evaluate their solutions using RICE-N, machine learning, economic intuition, and other domain knowledge. More information can be found on www.ai4climatecoop.org.
Acceleration of Subspace Learning Machine via Particle Swarm Optimization and Parallel Processing
Fu, Hongyu, Yang, Yijing, Liu, Yuhuai, Lin, Joseph, Harrison, Ethan, Mishra, Vinod K., Kuo, C. -C. Jay
Built upon the decision tree (DT) classification and regression idea, the subspace learning machine (SLM) has been recently proposed to offer higher performance in general classification and regression tasks. Its performance improvement is reached at the expense of higher computational complexity. In this work, we investigate two ways to accelerate SLM. First, we adopt the particle swarm optimization (PSO) algorithm to speed up the search of a discriminant dimension that is expressed as a linear combination of current dimensions. The search of optimal weights in the linear combination is computationally heavy. It is accomplished by probabilistic search in original SLM. The acceleration of SLM by PSO requires 10-20 times fewer iterations. Second, we leverage parallel processing in the SLM implementation. Experimental results show that the accelerated SLM method achieves a speed up factor of 577 in training time while maintaining comparable classification/regression performance of original SLM.
Cooperative and uncooperative institution designs: Surprises and problems in open-source game theory
Critch, Andrew, Dennis, Michael, Russell, Stuart
It is increasingly possible for real-world agents, such as software-based agents or human institutions, to view the internal programming of other such agents that they interact with. For instance, a company can read the bylaws of another company, or one software system can read the source code of another. Game-theoretic equilibria between the designers of such agents are called \emph{program equilibria}, and we call this area \emph{open-source game theory}. In this work we demonstrate a series of counterintuitive results on open-source games, which are independent of the programming language in which agents are written. We show that certain formal institution designs that one might expect to defect against each other will instead turn out to cooperate, or conversely, cooperate when one might expect them to defect. The results hold in a setting where each institution has full visibility into the other institution's true operating procedures. We also exhibit examples and ten open problems for better understanding these phenomena. We argue that contemporary game theory remains ill-equipped to study program equilibria, given that even the outcomes of single games in open-source settings remain counterintuitive and poorly understood. Nonetheless, some of these open-source agents exhibit desirable characteristics -- e.g., they can unexploitably create incentives for cooperation and legibility from other agents -- such that analyzing them could yield considerable benefits.
Computational Empathy Counteracts the Negative Effects of Anger on Creative Problem Solving
Groh, Matthew, Ferguson, Craig, Lewis, Robert, Picard, Rosalind
How does empathy influence creative problem solving? We introduce a computational empathy intervention based on context-specific affective mimicry and perspective taking by a virtual agent appearing in the form of a well-dressed polar bear. In an online experiment with 1,006 participants randomly assigned to an emotion elicitation intervention (with a control elicitation condition and anger elicitation condition) and a computational empathy intervention (with a control virtual agent and an empathic virtual agent), we examine how anger and empathy influence participants' performance in solving a word game based on Wordle. We find participants who are assigned to the anger elicitation condition perform significantly worse on multiple performance metrics than participants assigned to the control condition. However, we find the empathic virtual agent counteracts the drop in performance induced by the anger condition such that participants assigned to both the empathic virtual agent and the anger condition perform no differently than participants in the control elicitation condition and significantly better than participants assigned to the control virtual agent and the anger elicitation condition. While empathy reduces the negative effects of anger, we do not find evidence that the empathic virtual agent influences performance of participants who are assigned to the control elicitation condition. By introducing a framework for computational empathy interventions and conducting a two-by-two factorial design randomized experiment, we provide rigorous, empirical evidence that computational empathy can counteract the negative effects of anger on creative problem solving.