Agents
Learning to Assist Agents by Observing Them
Keurulainen, Antti, Westerlund, Isak, Kaski, Samuel, Ilin, Alexander
The ability of an AI agent to assist other agents, such as humans, is an important and challenging goal, which requires the assisting agent to reason about the behavior and infer the goals of the assisted agent. Training such an ability by using reinforcement learning usually requires large amounts of online training, which is difficult and costly. On the other hand, offline data about the behavior of the assisted agent might be available, but is non-trivial to take advantage of by methods such as offline reinforcement learning. We introduce methods where the capability to create a representation of the behavior is first pre-trained with offline data, after which only a small amount of interaction data is needed to learn an assisting policy. We test the setting in a gridworld where the helper agent has the capability to manipulate the environment of the assisted artificial agents, and introduce three different scenarios where the assistance considerably improves the performance of the assisted agents.
Collective eXplainable AI: Explaining Cooperative Strategies and Agent Contribution in Multiagent Reinforcement Learning with Shapley Values
Heuillet, Alexandre, Couthouis, Fabien, Dรญaz-Rodrรญguez, Natalia
While Explainable Artificial Intelligence (XAI) is increasingly expanding more areas of application, little has been applied to make deep Reinforcement Learning (RL) more comprehensible. As RL becomes ubiquitous and used in critical and general public applications, it is essential to develop methods that make it better understood and more interpretable. This study proposes a novel approach to explain cooperative strategies in multiagent RL using Shapley values, a game theory concept used in XAI that successfully explains the rationale behind decisions taken by Machine Learning algorithms. Through testing common assumptions of this technique in two cooperation-centered socially challenging multi-agent environments environments, this article argues that Shapley values are a pertinent way to evaluate the contribution of players in a cooperative multi-agent RL context. To palliate the high overhead of this method, Shapley values are approximated using Monte Carlo sampling. Experimental results on Multiagent Particle and Sequential Social Dilemmas show that Shapley values succeed at estimating the contribution of each agent. These results could have implications that go beyond games in economics, (e.g., for non-discriminatory decision making, ethical and responsible AI-derived decisions or policy making under fairness constraints). They also expose how Shapley values only give general explanations about a model and cannot explain a single run, episode nor justify precise actions taken by agents. Future work should focus on addressing these critical aspects.
An Unsupervised Video Game Playstyle Metric via State Discretization
Lin, Chiu-Chou, Chiu, Wei-Chen, Wu, I-Chen
On playing video games, different players usually have their own playstyles. Recently, there have been great improvements for the video game AIs on the playing strength. However, past researches for analyzing the behaviors of players still used heuristic rules or the behavior features with the game-environment support, thus being exhausted for the developers to define the features of discriminating various playstyles. In this paper, we propose the first metric for video game playstyles directly from the game observations and actions, without any prior specification on the playstyle in the target game. Our proposed method is built upon a novel scheme of learning discrete representations that can map game observations into latent discrete states, such that playstyles can be exhibited from these discrete states. Namely, we measure the playstyle distance based on game observations aligned to the same states. We demonstrate high playstyle accuracy of our metric in experiments on some video game platforms, including TORCS, RGSK, and seven Atari games, and for different agents including rule-based AI bots, learning-based AI bots, and human players.
DESTRESS: Computation-Optimal and Communication-Efficient Decentralized Nonconvex Finite-Sum Optimization
Li, Boyue, Li, Zhize, Chi, Yuejie
Emerging applications in multi-agent environments such as internet-of-things, networked sensing, autonomous systems and federated learning, call for decentralized algorithms for finite-sum optimizations that are resource-efficient in terms of both computation and communication. In this paper, we consider the prototypical setting where the agents work collaboratively to minimize the sum of local loss functions by only communicating with their neighbors over a predetermined network topology. We develop a new algorithm, called DEcentralized STochastic REcurSive gradient methodS (DESTRESS) for nonconvex finite-sum optimization, which matches the optimal incremental first-order oracle (IFO) complexity of centralized algorithms for finding first-order stationary points, while maintaining communication efficiency. Detailed theoretical and numerical comparisons corroborate that the resource efficiencies of DESTRESS improve upon prior decentralized algorithms over a wide range of parameter regimes. DESTRESS leverages several key algorithm design ideas including stochastic recursive gradient updates with mini-batches for local computation, gradient tracking with extra mixing (i.e., multiple gossiping rounds) for per-iteration communication, together with careful choices of hyper-parameters and new analysis frameworks to provably achieve a desirable computation-communication trade-off.
Partner-Aware Algorithms in Decentralized Cooperative Bandit Teams
Bฤฑyฤฑk, Erdem, Lalitha, Anusha, Saha, Rajarshi, Goldsmith, Andrea, Sadigh, Dorsa
When humans collaborate with each other, they often make decisions by observing others and considering the consequences that their actions may have on the entire team, instead of greedily doing what is best for just themselves. We would like our AI agents to effectively collaborate in a similar way by capturing a model of their partners. In this work, we propose and analyze a decentralized Multi-Armed Bandit (MAB) problem with coupled rewards as an abstraction of more general multi-agent collaboration. We demonstrate that na\"ive extensions of single-agent optimal MAB algorithms fail when applied for decentralized bandit teams. Instead, we propose a Partner-Aware strategy for joint sequential decision-making that extends the well-known single-agent Upper Confidence Bound algorithm. We analytically show that our proposed strategy achieves logarithmic regret, and provide extensive experiments involving human-AI and human-robot collaboration to validate our theoretical findings. Our results show that the proposed partner-aware strategy outperforms other known methods, and our human subject studies suggest humans prefer to collaborate with AI agents implementing our partner-aware strategy.
Pure Nash Equilibria in Resource Graph Games
Harks, Tobias (Augsburg University) | Klimm, Max | Matuschke, Jannik (KU Leuven)
This paper studies the existence of pure Nash equilibria in resource graph games, a general class of strategic games succinctly representing the playersโ private costs. These games are defined relative to a finite set of resources and the strategy set of each player corresponds to a set of subsets of resources. The cost of a resource is an arbitrary function of the load vector of a certain subset of resources. As our main result, we give complete characterizations of the cost functions guaranteeing the existence of pure Nash equilibria for weighted and unweighted players, respectively. For unweighted players, pure Nash equilibria are guaranteed to exist for any choice of the playersโ strategy space if and only if the cost of each resource is an arbitrary function of the load of the resource itself and linear in the load of all other resources where the linear coefficients of mutual influence of different resources are symmetric. This implies in particular that for any other cost structure there is a resource graph game that does not have a pure Nash equilibrium. For weighted games where players have intrinsic weights and the cost of each resource depends on the aggregated weight of its users, pure Nash equilibria are guaranteed to exist if and only if the cost of a resource is linear in all resource loads, and the linear factors of mutual influence are symmetric, or there is no interaction among resources and the cost is an exponential function of the local resource load. We further discuss the computational complexity of pure Nash equilibria in resource graph games showing that for unweighted games where pure Nash equilibria are guaranteed to exist, it is coNP-complete to decide for a given strategy profile whether it is a pure Nash equilibrium. For general resource graph games, we prove that the decision whether a pure Nash equilibrium exists is ฮฃ p 2 -complete.
Divergence-Regularized Multi-Agent Actor-Critic
Entropy regularization is a popular method in reinforcement learning (RL). Although it has many advantages, it alters the RL objective and makes the converged policy deviate from the optimal policy of the original Markov Decision Process. Though divergence regularization has been proposed to settle this problem, it cannot be trivially applied to cooperative multi-agent reinforcement learning (MARL). In this paper, we investigate divergence regularization in cooperative MARL and propose a novel off-policy cooperative MARL framework, divergence-regularized multi-agent actor-critic (DMAC). Mathematically, we derive the update rule of DMAC which is naturally off-policy, guarantees a monotonic policy improvement and is not biased by the regularization. DMAC is a flexible framework and can be combined with many existing MARL algorithms. We evaluate DMAC in a didactic stochastic game and StarCraft Multi-Agent Challenge and empirically show that DMAC substantially improves the performance of existing MARL algorithms.
Gradient Update #9: Bias Bounties and Hierarchical Architectures for Computer Vision
Welcome to the ninth update from the Gradient! If you were referred by a friend, subscribe and follow us on Twitter! This news edition's story is Sharing learnings from the first algorithmic bias bounty challenge. Summary Twitter's algorithmic bias bounty challenge, the first of its kind, recently concluded. While users had previously found the algorithm had a racial bias, the bounty uncovered a number of other biases and potential harms.
A Privacy-preserving Distributed Training Framework for Cooperative Multi-agent Deep Reinforcement Learning
Deep Reinforcement Learning (DRL) sometimes needs a large amount of data to converge in the training procedure and in some cases, each action of the agent may produce regret. This barrier naturally motivates different data sets or environment owners to cooperate to share their knowledge and train their agents more efficiently. However, it raises privacy concerns if we directly merge the raw data from different owners. To solve this problem, we proposed a new Deep Neural Network (DNN) architecture with both global NN and local NN, and a distributed training framework. We allow the global weights to be updated by all the collaborator agents while the local weights are only updated by the agent they belong to. In this way, we hope the global weighs can share the common knowledge among these collaborators while the local NN can keep the specialized properties and ensure the agent to be compatible with its specific environment. Experiments show that the framework can efficiently help agents in the same or similar environments to collaborate in their training process and gain a higher convergence rate and better performance.
Implementation of Parallel Simplified Swarm Optimization in CUDA
Yeh, Wei-Chang, Liu, Zhenyao, Tan, Shi-Yi, Huang, Shang-Ke
As the acquisition cost of the graphics processing unit (GPU) has decreased, personal computers (PC) can handle optimization problems nowadays. In optimization computing, intelligent swarm algorithms (SIAs) method is suitable for parallelization. However, a GPU-based Simplified Swarm Optimization Algorithm has never been proposed. Accordingly, this paper proposed Parallel Simplified Swarm Optimization (PSSO) based on the CUDA platform considering computational ability and versatility. In PSSO, the theoretical value of time complexity of fitness function is O (tNm). There are t iterations and N fitness functions, each of which required pair comparisons m times. pBests and gBest have the resource preemption when updating in previous studies. As the experiment results showed, the time complexity has successfully reduced by an order of magnitude of N, and the problem of resource preemption was avoided entirely.