AITopics | Agents

Collaborating Authors

Agents

News Overviews Instructional Materials AI-Alerts Classics

An Optimization Framework for Task Sequencing in Curriculum Learning

arXiv.org Machine LearningJan-31-2019

Abstract--Curriculum learning is gaining popularity in (deep) reinforcement learning. It can alleviate the burden on data collection and provide better exploration policies through transfer and generalization from less complex tasks. Current methods for automatic task sequencing for curriculum learning in reinforcement learning provided initial heuristic solutions, with little to no guarantee on their quality. We introduce an optimization framework for task sequencing composed of the problem definition, several candidate performance metrics for optimization, and three benchmark algorithms. We experimentally show that the two most commonly used baselines (learning with no curriculum, and with a random curriculum) perform worse than a simple greedy algorithm. Furthermore, we show theoretically and demonstrate experimentally that the three proposed algorithms provide increasing solution quality at moderately increasing computational complexity, and show that they constitute better baselines for curriculum learning in reinforcement learning. Reinforcement Learning (RL) has recently been successfully applied to a number of tasks whose complexity would have appeared overwhelming only a few years ago [1], [2]. In such large and complex environments, classical exploration strategies designed for Markov Decision Processes (MDPs), aiming at visiting every state the most efficiently, are inadequate. One approach actively investigated is the use of transfer learning [3] to generalize from previous similar tasks, and more recently the application of transfer learning to sequences of tasks of increasing complexity forming a curriculum . Curriculum Learning is often employed in (Deep) RL [4], [5] to let the agent progress more quickly towards better behaviors, but curricula are mostly designed by hand. Curriculum learning has the potential to greatly increase the quality of the behavior discovered by the agent. However, at the moment, creating an appropriate curriculum requires significant human intuition.

algorithm, curriculum, final task, (14 more...)

arXiv.org Machine Learning

1901.11478

Country: Europe > United Kingdom > England > West Yorkshire > Leeds (0.04)

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.84)

Add feedback

Learning Independently-Obtainable Reward Functions

Grimm, Christopher, Singh, Satinder

arXiv.org Machine LearningJan-31-2019

We present a novel method for learning a set of disentangled reward functions that sum to the original environment reward and are constrained to be independently obtainable. We define independent obtainability in terms of value functions with respect to obtaining one learned reward while pursuing another learned reward. Empirically, we illustrate that our method can learn meaningful reward decompositions in a variety of domains and that these decompositions exhibit some form of generalization performance when the environment's reward is modified. Theoretically, we derive results about the effect of maximizing our method's objective on the resulting reward functions and their corresponding optimal policies.

decomposition, latexit latexitsha1, reward function, (12 more...)

arXiv.org Machine Learning

1901.08649

Country: North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.46)

Add feedback

Probabilistic Pursuits on Graphs

Amir, Michael, Bruckstein, Alfred M.

arXiv.org Artificial IntelligenceJan-31-2019

We consider discrete dynamical systems of "ant-like" agents engaged in a sequence of pursuits on a graph environment. The agents emerge one by one at equal time intervals from a source vertex $s$ and pursue each other by greedily attempting to close the distance to their immediate predecessor, the agent that emerged just before them from $s$, until they arrive at the destination point $t$. Such pursuits have been investigated before in the continuous setting and in discrete time when the underlying environment is a regular grid. In both these settings the agents' walks provably converge to a shortest path from $s$ to $t$. Furthermore, assuming a certain natural probability distribution over the move choices of the agents on the grid (in case there are multiple shortest paths between an agent and its predecessor), the walks converge to the uniform distribution over all shortest paths from $s$ to $t$. We study the evolution of agent walks over a general finite graph environment $G$. Our model is a natural generalization of the pursuit rule proposed for the case of the grid. The main results are as follows. We show that "convergence" to the shortest paths in the sense of previous work extends to all pseudo-modular graphs (i.e. graphs in which every three pairwise intersecting disks have a nonempty intersection), and also to environments obtained by taking graph products, generalizing previous results in two different ways. We show that convergence to the shortest paths is also obtained by chordal graphs, and discuss some further positive and negative results for planar graphs. In the most general case, convergence to the shortest paths is not guaranteed, and the agents may get stuck on sets of recurrent, non-optimal walks from $s$ to $t$. However, we show that the limiting distributions of the agents' walks will always be uniform distributions over some set of walks of equal length.

graph, shortest path, vertex, (15 more...)

arXiv.org Artificial Intelligence

1710.08107

Country:

North America > United States > New York (0.04)
Asia > Middle East > Israel > Haifa District > Haifa (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.67)

Add feedback

Facebook and Google built a framework to study how AI agents talk to each other

#artificialintelligenceJan-30-2019, 01:36:35 GMT

The intricacies of evolutionary linguistics are myriad and underexplored, but research involving artificial intelligence (AI) might unlock the door to new theories about how dialects develop among users. This work isn't the first to investigate language with machine learning algorithms -- a paper published by Facebook researchers in June 2017 detailed how two agents learned to "negotiate" with each other in chat messages. But they claim it's the first to use "latest-generation deep neural agents" capable of dealing with "rich perceptual input," and they say it convincingly demonstrates that language can evolve from simple exchanges. The team began by deploying groups -- communities -- of agents equipped with the ability to communicate in a simulated environment, with complexities ranging from simple (a set of equations) to relatively complicated (a deep neural network). The "games" the agents were tasked with playing had several key properties: They were symmetric, enabling the agents to act as both "speakers" and "listeners"; they allowed the agents to communicate about something "external" to themselves, such as the sensory experience of something in their environment; and they took place in a world the agents could at least partially observe.

agent, artificial intelligence, machine learning, (7 more...)

#artificialintelligence

Genre: Research Report > New Finding (0.31)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Towards a Characterization of Explainable Systems

Bohlender, Dimitri, Köhl, Maximilian A.

arXiv.org Artificial IntelligenceJan-30-2019

Building software-driven systems that are easily understood becomes a challenge, with their ever-increasing complexity and autonomy. Accordingly, recent research efforts strive to aid in designing explainable systems. Nevertheless, a common notion of what it takes for a system to be explainable is still missing. To address this problem, we propose a characterization of explainable systems that consolidates existing research. By providing a unified terminology, we lay a basis for the classification of both existing and future research, and the formulation of precise requirements towards such systems.

explanation, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

1902.03096

Country:

North America > United States > Florida > Broward County > Fort Lauderdale (0.04)
North America > United States > California > Monterey County > Monterey (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(4 more...)

Genre: Research Report (0.50)

Industry: Transportation (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)

Add feedback

Multi Agent Reinforcement Learning with Multi-Step Generative Models

Krupnik, Orr, Mordatch, Igor, Tamar, Aviv

arXiv.org Machine LearningJan-29-2019

The dynamics between agents and the environment are an important component of multi-agent Reinforcement Learning (RL), and learning them provides a basis for decision making. However, a major challenge in optimizing a learned dynamics model is the accumulation of error when predicting multiple steps into the future. Recent advances in variational inference provide model based solutions that predict complete trajectory segments, and optimize over a latent representation of trajectories. For single-agent scenarios, several recent studies have explored this idea, and showed its benefits over conventional methods. In this work, we extend this approach to the multi-agent case, and effectively optimize over a latent space that encodes multi-agent strategies. We discuss the challenges in optimizing over a latent variable model for multiple agents, both in the optimization algorithm and in the model representation, and propose a method for both cooperative and competitive settings based on risk-sensitive optimization. We evaluate our method on tasks in the multi-agent particle environment and on a simulated RoboCup domain.

agent, multi agent reinforcement learning, trajectory, (13 more...)

arXiv.org Machine Learning

1901.10251

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Israel > Haifa District > Haifa (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Sports > Soccer (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

New Approximations for Coalitional Manipulation in Scoring Rules

Keller, Orgad, Hassidim, Avinatan, Hazon, Noam

Journal of Artificial Intelligence ResearchJan-29-2019

We study the problem of coalitional manipulation---where k manipulators try to manipulate an election on m candidates---for any scoring rule, with focus on the Borda protocol. We do so in both the weighted and unweighted settings. For these problems, recent approximation approaches have tried to minimize k, the number of manipulators needed to make some preferred candidate p win (thus assuming that the number of manipulators is not limited in advance). In contrast, we focus on minimizing the score margin of p which is the difference between the maximum score of a candidate and the score of p. We provide algorithms that approximate the optimum score margin, which are applicable to any scoring rule. For the specific case of the Borda protocol in the unweighted setting, our algorithm provides a superior approximation factor for lower values of k.Our methods are novel and adapt techniques from multiprocessor scheduling by carefully rounding an exponentially-large configuration linear program that is solved by using the ellipsoid method with an efficient separation oracle. We believe that such methods could be beneficial in other social choice settings as well.

algorithm, constraint, manipulator, (12 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.1.11335

AI Access Foundation

11335

Journal of Artificial Intelligence Research

Country:

Asia > Middle East > Israel (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(2 more...)

Genre: Research Report (0.46)

Industry: Leisure & Entertainment > Games (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.67)

Add feedback

Unity and Google Cloud Platform launch challenge to push limits of game AI

#artificialintelligenceJan-28-2019, 13:07:58 GMT

Unity Technologies has teamed up with Google Cloud Platform to create the Obstacle Tower Challenge, which will test the limits of artificial intelligence in games. In the first-of-its-kind contest, Google will offer a prize of cash, travel vouchers, and Google Cloud Platform credits, valued at more than $100,000. Unity, the maker of the Unity game engine, is creating the contest to test the capabilities of intelligent agents and accelerate the research and development of AI. (Unity recently got in a spat with Improbable over a licensing dispute.) The Obstacle Tower Challenge will be a new benchmark aimed at testing the vision, control, planning, and generalization abilities of AI systems -- capabilities that no other benchmark has tested together before. Above: The Obstacle Tower Challenge offers $100,000 in prizes.

artificial intelligence, cloud computing, obstacle tower challenge, (8 more...)

#artificialintelligence

Industry: Information Technology > Services (1.00)

Technology:

Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.62)

Add feedback

An algorithm that mimics our tribal instincts could help AI learn to socialize

#artificialintelligenceJan-28-2019, 09:25:21 GMT

Humans are instinctively tribal creatures. When we observe the interactions of people around us, we can intuitively infer whom we should get along with and whom we shouldn't. This might sound like a negative instinct, but it's actually what makes teamwork possible. Researchers at MIT believe this skill may be an important prerequisite for creating sociable AI systems that can cooperate with us in our day-to-day lives. Game-playing AI agents also require an understanding of the relationship landscape to know whom to cooperate and compete with.

algorithm, artificial intelligence, machine learning, (6 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.41)

Add feedback

Correcting and identifying the blind spots in Artificial Intelligence - Express Computer

#artificialintelligenceJan-28-2019, 09:24:58 GMT

A novel model developed by MIT and Microsoft researchers identifies instances in which autonomous systems have "learned" from training examples that don't match what's actually happening in the real world. Engineers could use this model to improve the safety of artificial intelligence systems, such as driverless vehicles and autonomous robots. The AI systems powering driverless cars, for example, are trained extensively in virtual simulations to prepare the vehicle for nearly every event on the road. But sometimes the car makes an unexpected error in the real world because an event occurs that should, but doesn't, alter the car's behavior. Consider a driverless car that wasn't trained, and more importantly doesn't have the sensors necessary, to differentiate between distinctly different scenarios, such as large, white cars and ambulances with red, flashing lights on the road.

artificial intelligence, blind spot, machine learning, (15 more...)

#artificialintelligence

Industry:

Transportation > Passenger (0.74)
Transportation > Ground > Road (0.74)
Information Technology > Robotics & Automation (0.74)
Automobiles & Trucks (0.74)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.69)

Add feedback