Agents
An Optimization Framework for Task Sequencing in Curriculum Learning
Foglino, Francesco, Leonetti, Matteo
Abstract--Curriculum learning is gaining popularity in (deep) reinforcement learning. It can alleviate the burden on data collection and provide better exploration policies through transfer and generalization from less complex tasks. Current methods for automatic task sequencing for curriculum learning in reinforcement learning provided initial heuristic solutions, with little to no guarantee on their quality. We introduce an optimization framework for task sequencing composed of the problem definition, several candidate performance metrics for optimization, and three benchmark algorithms. We experimentally show that the two most commonly used baselines (learning with no curriculum, and with a random curriculum) perform worse than a simple greedy algorithm. Furthermore, we show theoretically and demonstrate experimentally that the three proposed algorithms provide increasing solution quality at moderately increasing computational complexity, and show that they constitute better baselines for curriculum learning in reinforcement learning. Reinforcement Learning (RL) has recently been successfully applied to a number of tasks whose complexity would have appeared overwhelming only a few years ago [1], [2]. In such large and complex environments, classical exploration strategies designed for Markov Decision Processes (MDPs), aiming at visiting every state the most efficiently, are inadequate. One approach actively investigated is the use of transfer learning [3] to generalize from previous similar tasks, and more recently the application of transfer learning to sequences of tasks of increasing complexity forming a curriculum . Curriculum Learning is often employed in (Deep) RL [4], [5] to let the agent progress more quickly towards better behaviors, but curricula are mostly designed by hand. Curriculum learning has the potential to greatly increase the quality of the behavior discovered by the agent. However, at the moment, creating an appropriate curriculum requires significant human intuition.
Learning Independently-Obtainable Reward Functions
Grimm, Christopher, Singh, Satinder
We present a novel method for learning a set of disentangled reward functions that sum to the original environment reward and are constrained to be independently obtainable. We define independent obtainability in terms of value functions with respect to obtaining one learned reward while pursuing another learned reward. Empirically, we illustrate that our method can learn meaningful reward decompositions in a variety of domains and that these decompositions exhibit some form of generalization performance when the environment's reward is modified. Theoretically, we derive results about the effect of maximizing our method's objective on the resulting reward functions and their corresponding optimal policies.
Probabilistic Pursuits on Graphs
Amir, Michael, Bruckstein, Alfred M.
We consider discrete dynamical systems of "ant-like" agents engaged in a sequence of pursuits on a graph environment. The agents emerge one by one at equal time intervals from a source vertex $s$ and pursue each other by greedily attempting to close the distance to their immediate predecessor, the agent that emerged just before them from $s$, until they arrive at the destination point $t$. Such pursuits have been investigated before in the continuous setting and in discrete time when the underlying environment is a regular grid. In both these settings the agents' walks provably converge to a shortest path from $s$ to $t$. Furthermore, assuming a certain natural probability distribution over the move choices of the agents on the grid (in case there are multiple shortest paths between an agent and its predecessor), the walks converge to the uniform distribution over all shortest paths from $s$ to $t$. We study the evolution of agent walks over a general finite graph environment $G$. Our model is a natural generalization of the pursuit rule proposed for the case of the grid. The main results are as follows. We show that "convergence" to the shortest paths in the sense of previous work extends to all pseudo-modular graphs (i.e. graphs in which every three pairwise intersecting disks have a nonempty intersection), and also to environments obtained by taking graph products, generalizing previous results in two different ways. We show that convergence to the shortest paths is also obtained by chordal graphs, and discuss some further positive and negative results for planar graphs. In the most general case, convergence to the shortest paths is not guaranteed, and the agents may get stuck on sets of recurrent, non-optimal walks from $s$ to $t$. However, we show that the limiting distributions of the agents' walks will always be uniform distributions over some set of walks of equal length.
Facebook and Google built a framework to study how AI agents talk to each other
The intricacies of evolutionary linguistics are myriad and underexplored, but research involving artificial intelligence (AI) might unlock the door to new theories about how dialects develop among users. This work isn't the first to investigate language with machine learning algorithms -- a paper published by Facebook researchers in June 2017 detailed how two agents learned to "negotiate" with each other in chat messages. But they claim it's the first to use "latest-generation deep neural agents" capable of dealing with "rich perceptual input," and they say it convincingly demonstrates that language can evolve from simple exchanges. The team began by deploying groups -- communities -- of agents equipped with the ability to communicate in a simulated environment, with complexities ranging from simple (a set of equations) to relatively complicated (a deep neural network). The "games" the agents were tasked with playing had several key properties: They were symmetric, enabling the agents to act as both "speakers" and "listeners"; they allowed the agents to communicate about something "external" to themselves, such as the sensory experience of something in their environment; and they took place in a world the agents could at least partially observe.
Towards a Characterization of Explainable Systems
Bohlender, Dimitri, Kรถhl, Maximilian A.
Building software-driven systems that are easily understood becomes a challenge, with their ever-increasing complexity and autonomy. Accordingly, recent research efforts strive to aid in designing explainable systems. Nevertheless, a common notion of what it takes for a system to be explainable is still missing. To address this problem, we propose a characterization of explainable systems that consolidates existing research. By providing a unified terminology, we lay a basis for the classification of both existing and future research, and the formulation of precise requirements towards such systems.
Multi Agent Reinforcement Learning with Multi-Step Generative Models
Krupnik, Orr, Mordatch, Igor, Tamar, Aviv
The dynamics between agents and the environment are an important component of multi-agent Reinforcement Learning (RL), and learning them provides a basis for decision making. However, a major challenge in optimizing a learned dynamics model is the accumulation of error when predicting multiple steps into the future. Recent advances in variational inference provide model based solutions that predict complete trajectory segments, and optimize over a latent representation of trajectories. For single-agent scenarios, several recent studies have explored this idea, and showed its benefits over conventional methods. In this work, we extend this approach to the multi-agent case, and effectively optimize over a latent space that encodes multi-agent strategies. We discuss the challenges in optimizing over a latent variable model for multiple agents, both in the optimization algorithm and in the model representation, and propose a method for both cooperative and competitive settings based on risk-sensitive optimization. We evaluate our method on tasks in the multi-agent particle environment and on a simulated RoboCup domain.
New Approximations for Coalitional Manipulation in Scoring Rules
Keller, Orgad, Hassidim, Avinatan, Hazon, Noam
We study the problem of coalitional manipulation---where k manipulators try to manipulate an election on m candidates---for any scoring rule, with focus on the Borda protocol. We do so in both the weighted and unweighted settings. For these problems, recent approximation approaches have tried to minimize k, the number of manipulators needed to make some preferred candidate p win (thus assuming that the number of manipulators is not limited in advance). In contrast, we focus on minimizing the score margin of p which is the difference between the maximum score of a candidate and the score of p.ย We provide algorithms that approximate the optimum score margin, which are applicable to any scoring rule. For the specific case of the Borda protocol in the unweighted setting, our algorithm provides a superior approximation factor for lower values of k.Our methods are novel and adapt techniques from multiprocessor scheduling by carefully rounding an exponentially-large configuration linear program that is solved by using the ellipsoid method with an efficient separation oracle. We believe that such methods could be beneficial in other social choice settings as well.
Unity and Google Cloud Platform launch challenge to push limits of game AI
Unity Technologies has teamed up with Google Cloud Platform to create the Obstacle Tower Challenge, which will test the limits of artificial intelligence in games. In the first-of-its-kind contest, Google will offer a prize of cash, travel vouchers, and Google Cloud Platform credits, valued at more than $100,000. Unity, the maker of the Unity game engine, is creating the contest to test the capabilities of intelligent agents and accelerate the research and development of AI. (Unity recently got in a spat with Improbable over a licensing dispute.) The Obstacle Tower Challenge will be a new benchmark aimed at testing the vision, control, planning, and generalization abilities of AI systems -- capabilities that no other benchmark has tested together before. Above: The Obstacle Tower Challenge offers $100,000 in prizes.
An algorithm that mimics our tribal instincts could help AI learn to socialize
Humans are instinctively tribal creatures. When we observe the interactions of people around us, we can intuitively infer whom we should get along with and whom we shouldn't. This might sound like a negative instinct, but it's actually what makes teamwork possible. Researchers at MIT believe this skill may be an important prerequisite for creating sociable AI systems that can cooperate with us in our day-to-day lives. Game-playing AI agents also require an understanding of the relationship landscape to know whom to cooperate and compete with.
Correcting and identifying the blind spots in Artificial Intelligence - Express Computer
A novel model developed by MIT and Microsoft researchers identifies instances in which autonomous systems have "learned" from training examples that don't match what's actually happening in the real world. Engineers could use this model to improve the safety of artificial intelligence systems, such as driverless vehicles and autonomous robots. The AI systems powering driverless cars, for example, are trained extensively in virtual simulations to prepare the vehicle for nearly every event on the road. But sometimes the car makes an unexpected error in the real world because an event occurs that should, but doesn't, alter the car's behavior. Consider a driverless car that wasn't trained, and more importantly doesn't have the sensors necessary, to differentiate between distinctly different scenarios, such as large, white cars and ambulances with red, flashing lights on the road.