Agents
A Parameterized Perspective on Protecting Elections
Dey, Palash, Misra, Neeldhara, Nath, Swaprava, Shakya, Garima
We study the parameterized complexity of the optimal defense and optimal attack problems in voting. In both the problems, the input is a set of voter groups (every voter group is a set of votes) and two integers $k_a$ and $k_d$ corresponding to respectively the number of voter groups the attacker can attack and the number of voter groups the defender can defend. A voter group gets removed from the election if it is attacked but not defended. In the optimal defense problem, we want to know if it is possible for the defender to commit to a strategy of defending at most $k_d$ voter groups such that, no matter which $k_a$ voter groups the attacker attacks, the outcome of the election does not change. In the optimal attack problem, we want to know if it is possible for the attacker to commit to a strategy of attacking $k_a$ voter groups such that, no matter which $k_d$ voter groups the defender defends, the outcome of the election is always different from the original (without any attack) one.
Maximum Entropy-Regularized Multi-Goal Reinforcement Learning
Zhao, Rui, Sun, Xudong, Tresp, Volker
In Multi-Goal Reinforcement Learning, an agent learns to achieve multiple goals with a goal-conditioned policy. During learning, the agent first collects the trajectories into a replay buffer, and later these trajectories are selected randomly for replay. However, the achieved goals in the replay buffer are often biased towards the behavior policies. From a Bayesian perspective, when there is no prior knowledge about the target goal distribution, the agent should learn uniformly from diverse achieved goals. Therefore, we first propose a novel multi-goal RL objective based on weighted entropy. This objective encourages the agent to maximize the expected return, as well as to achieve more diverse goals. Secondly, we developed a maximum entropy-based prioritization framework to optimize the proposed objective. For evaluation of this framework, we combine it with Deep Deterministic Policy Gradient, both with or without Hindsight Experience Replay. On a set of multi-goal robotic tasks of OpenAI Gym, we compare our method with other baselines and show promising improvements in both performance and sample-efficiency.
Coordinated Exploration via Intrinsic Rewards for Multi-Agent Reinforcement Learning
Sparse rewards are one of the most important challenges in reinforcement learning. In the single-agent setting, these challenges have been addressed by introducing intrinsic rewards that motivate agents to explore unseen regions of their state spaces. Applying these techniques naively to the multi-agent setting results in individual agents exploring independently, without any coordination among themselves. We argue that learning in cooperative multi-agent settings can be accelerated and improved if agents coordinate with respect to what they have explored. In this paper we propose an approach for learning how to dynamically select between different types of intrinsic rewards which consider not just what an individual agent has explored, but all agents, such that the agents can coordinate their exploration and maximize extrinsic returns. Concretely, we formulate the approach as a hierarchical policy where a high-level controller selects among sets of policies trained on different types of intrinsic rewards and the low-level controllers learn the action policies of all agents under these specific rewards. We demonstrate the effectiveness of the proposed approach in a multi-agent learning domain with sparse rewards.
Distributed Artificial Intelligence: A primer on Multi-Agent Systems, Agent-Based Modeling, and Swarm Intelligence
Almost two years ago, I paused thinking about the future of AI and drew down some "predictions" about where I thought the field was going. One of those forecasts concerned reaching a general intelligence in several years, not through a super powerful 100-layers deep learning algorithm, but rather through something called collective intelligence. However, except for very obvious applications (e.g., drones), I have not read or seen any big development in the field and I thus thought to dig a bit into that to check what is currently going on. As part of the AI Knowledge Map then, I will have a look here not only at Swarm Intelligence (SI) but more generally at Distributed AI, which also includes Agent-Based Modeling (ABM) and Multi-Agent Systems (MAS). Let's start from the broader classification.
Actor-Attention-Critic for Multi-Agent Reinforcement Learning
Reinforcement learning in multi-agent scenarios is important for real-world applications but presents challenges beyond those seen in single-agent settings. We present an actor-critic algorithm that trains decentralized policies in multi-agent settings, using centrally computed critics that share an attention mechanism which selects relevant information for each agent at every timestep. This attention mechanism enables more effective and scalable learning in complex multi-agent environments, when compared to recent approaches. Our approach is applicable not only to cooperative settings with shared rewards, but also individualized reward settings, including adversarial settings, as well as settings that do not provide global states, and it makes no assumptions about the action spaces of the agents. As such, it is flexible enough to be applied to most multi-agent learning problems.
AI-GAs: AI-generating algorithms, an alternate paradigm for producing general artificial intelligence
Perhaps the most ambitious scientific quest in human history is the creation of general artificial intelligence, which roughly means AI that is as smart or smarter than humans. The dominant approach in the machine learning community is to attempt to discover each of the pieces required for intelligence, with the implicit assumption that some future group will complete the Herculean task of figuring out how to combine all of those pieces into a complex thinking machine. I call this the ``manual AI approach.'' This paper describes another exciting path that ultimately may be more successful at producing general AI. It is based on the clear trend in machine learning that hand-designed solutions eventually are replaced by more effective, learned solutions. The idea is to create an AI-generating algorithm (AI-GA), which automatically learns how to produce general AI. Three Pillars are essential for the approach: (1) meta-learning architectures, (2) meta-learning the learning algorithms themselves, and (3) generating effective learning environments. I argue that either approach could produce general AI first, and both are scientifically worthwhile irrespective of which is the fastest path. Because both are promising, yet the ML community is currently committed to the manual approach, I argue that our community should increase its research investment in the AI-GA approach. To encourage such research, I describe promising work in each of the Three Pillars. I also discuss AI-GA-specific safety and ethical considerations. Because it it may be the fastest path to general AI and because it is inherently scientifically interesting to understand the conditions in which a simple algorithm can produce general AI (as happened on Earth where Darwinian evolution produced human intelligence), I argue that the pursuit of AI-GAs should be considered a new grand challenge of computer science research.
Evolving Self-supervised Neural Networks: Autonomous Intelligence from Evolved Self-teaching
This paper presents a technique called evolving self-supervised neural networks - neural networks that can teach themselves, intrinsically motivated, without external supervision or reward. The proposed method presents some sort-of paradigm shift, and differs greatly from both traditional gradient-based learning and evolutionary algorithms in that it combines the metaphor of evolution and learning, more specifically self-learning, together, rather than treating these phenomena alternatively. I simulate a multi-agent system in which neural networks are used to control autonomous foraging agents with little domain knowledge. Experimental results show that only evolved self-supervised agents can demonstrate some sort of intelligent behaviour, but not evolution or self-learning alone. Indications for future work on evolving intelligence are also presented.
Learning latent state representation for speeding up exploration
Vezzani, Giulia, Gupta, Abhishek, Natale, Lorenzo, Abbeel, Pieter
Exploration is an extremely challenging problem in reinforcement learning, especially in high dimensional state and action spaces and when only sparse rewards are available. Effective representations can indicate which components of the state are task relevant and thus reduce the dimensionality of the space to explore. In this work, we take a representation learning viewpoint on exploration, utilizing prior experience to learn effective latent representations, which can subsequently indicate which regions to explore. Prior experience on separate but related tasks help learn representations of the state which are effective at predicting instantaneous rewards. These learned representations can then be used with an entropy-based exploration method to effectively perform exploration in high dimensional spaces by effectively lowering the dimensionality of the search space. We show the benefits of this representation for meta-exploration in a simulated object pushing environment.
Explainable Reinforcement Learning Through a Causal Lens
Madumal, Prashan, Miller, Tim, Sonenberg, Liz, Vetere, Frank
Prevalent theories in cognitive science propose that humans understand and represent the knowledge of the world through causal relationships. In making sense of the world, we build causal models in our mind to encode cause-effect relations of events and use these to explain why new events happen. In this paper, we use causal models to derive causal explanations of behaviour of reinforcement learning agents. We present an approach that learns a structural causal model during reinforcement learning and encodes causal relationships between variables of interest. This model is then used to generate explanations of behaviour based on counterfactual analysis of the causal model. We report on a study with 120 participants who observe agents playing a real-time strategy game (Starcraft II) and then receive explanations of the agents' behaviour. We investigated: 1) participants' understanding gained by explanations through task prediction; 2) explanation satisfaction and 3) trust. Our results show that causal model explanations perform better on these measures compared to two other baseline explanation models.
A Hybrid Algorithm for Metaheuristic Optimization
Khanna, Sujit Pramod, Ororbia, Alexander II
We propose a novel, flexible algorithm for combining together metaheuristic optimizers for non-convex optimization problems. Our approach treats the constituent optimizers as a team of complex agents that communicate information amongst each other at various intervals during the simulation process. The information produced by each individual agent can be combined in various ways via higher-level operators. In our experiments on key benchmark functions, we investigate how the performance of our algorithm varies with respect to several of its key modifiable properties. Finally, we apply our proposed algorithm to classification problems involving the optimization of support-vector machine classifiers.