AITopics

We present an adversarial exploration strategy, a simple yet effective imitation learning scheme that incentivizes exploration of an environment without any extrinsic reward or human demonstration. Our framework consists of a deep reinforcement learning (DRL) agent and an inverse dynamics model contesting with each other. The former collects training samples for the latter, and its objective is to maximize the error of the latter. The latter is trained with samples collected by the former, and generates rewards for the former when it fails to predict the actual action taken by the former. In such a competitive setting, the DRL agent learns to generate samples that the inverse dynamics model fails to predict correctly, and the inverse dynamics model learns to adapt to the challenging samples. We further propose a reward structure that ensures the DRL agent collects only moderately hard samples and not overly hard ones that prevent the inverse model from imitating effectively. We evaluate the effectiveness of our method on several OpenAI gym robotic arm and hand manipulation tasks against a number of baseline models. Experimental results show that our method is comparable to that directly trained with expert demonstrations, and superior to the other baselines even without any human priors.

inverse dynamic model, neural network, upstream oil & gas, (19 more...)

1806.10019

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Lin, Xiaomin, Adams, Stephen C., Beling, Peter A.

Multi-agent Inverse Reinforcement Learning for General-sum Stochastic Games

arXiv.org Machine LearningJun-26-2018

This paper addresses the problem of multi-agent inverse reinforcement learning (MIRL) in a two-player general-sum stochastic game framework. Five variants of MIRL are considered: uCS-MIRL, advE-MIRL, cooE-MIRL, uCE-MIRL, and uNE-MIRL, each distinguished by its solution concept. Problem uCS-MIRL is a cooperative game in which the agents employ cooperative strategies that aim to maximize the total game value. In problem uCE-MIRL, agents are assumed to follow strategies that constitute a correlated equilibrium while maximizing total game value. Problem uNE-MIRL is similar to uCE-MIRL in total game value maximization, but it is assumed that the agents are playing a Nash equilibrium. Problems advE-MIRL and cooE-MIRL assume agents are playing an adversarial equilibrium and a coordination equilibrium, respectively. We propose novel approaches to address these five problems under the assumption that the game observer either knows or is able to accurate estimate the policies and solution concepts for players. For uCS-MIRL, we first develop a characteristic set of solutions ensuring that the observed bi-policy is a uCS and then apply a Bayesian inverse learning method. For uCE-MIRL, we develop a linear programming problem subject to constraints that define necessary and sufficient conditions for the observed policies to be correlated equilibria. The objective is to choose a solution that not only minimizes the total game value difference between the observed bi-policy and a local uCS, but also maximizes the scale of the solution. We apply a similar treatment to the problem of uNE-MIRL. The remaining two problems can be solved efficiently by taking advantage of solution uniqueness and setting up a convex optimization problem. Results are validated on various benchmark grid-world games.

equilibrium, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

1806.09795

Country:

North America > United States > Virginia > Albemarle County > Charlottesville (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
(4 more...)

Genre: Research Report (0.84)

Industry:

Banking & Finance > Trading (1.00)
Leisure & Entertainment > Games (0.68)
Leisure & Entertainment > Sports (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Gamrian, Shani, Goldberg, Yoav

Transfer Learning for Related Reinforcement Learning Tasks via Image-to-Image Translation

Deep Reinforcement Learning has managed to achieve state-of-the-art results in learning control policies directly from raw pixels. However, despite its remarkable success, it fails to generalize, a fundamental component required in a stable Artificial Intelligence system. Using the Atari game Breakout, we demonstrate the difficulty of a trained agent in adjusting to simple modifications in the raw image, ones that a human could adapt to trivially. In transfer learning, the goal is to use the knowledge gained from the source task to make the training of the target task faster and better. We show that using various forms of fine-tuning, a common method for transfer learning, is not effective for adapting to such small visual changes. In fact, it is often easier to re-train the agent from scratch than to fine-tune a trained agent. We suggest that in some cases transfer learning can be improved by adding a dedicated component whose goal is to learn to visually map between the known domain and the new one. Concretely, we use Generative Adversarial Networks (GANs) to create a mapping function to translate images in the target task to corresponding images in the source task, allowing us to transform between the different tasks. We show that learning this mapping is substantially more efficient than re-training. A visualization of a trained agent playing in a modified condition, with and without the GAN transfer, can be seen in https://youtu.be/e2TwjduPT8g .

machine learning, reinforcement learning, target task, (20 more...)

1806.07377

Country:

North America > United States > Washington > King County > Bellevue (0.04)
Asia > Middle East > Israel (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report > New Finding (0.47)

Industry: Leisure & Entertainment > Games > Computer Games (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Lerer, Adam, Peysakhovich, Alexander

Learning Social Conventions in Markov Games

Social conventions - arbitrary ways to organize group behavior - are an important part of social life. Any agent that wants to enter an existing society must be able to learn its conventions (e.g. which side of the road to drive on, which language to speak) from relatively few observations or risk being unable to coordinate with everyone else. We consider the game theoretic framework of David Lewis which views the selection of a social convention as the selection of an equilibrium in a coordination game. We ask how to construct reinforcement learning based agents that can solve the convention learning task in the self-play paradigm: at training time the agent has access to a good model of the environment and a small amount of observations about how individuals in society act. The agent then has to construct a policy that is compatible with the test-time social convention. We study three environments from the literature which have multiple conventions: traffic, communication, and risky coordination. In each of these we observe that adding a small amount of imitation learning during self-play training greatly increases the probability that the strategy found by self-play fits well with the social convention the agent will face at test time. We show that this works even in an environment where standard independent multi-agent RL very rarely finds the correct test-time equilibrium.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

1806.10071

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Computer Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations

Rajeswaran, Aravind, Kumar, Vikash, Gupta, Abhishek, Vezzani, Giulia, Schulman, John, Todorov, Emanuel, Levine, Sergey

Multi-fingered dexterous manipulators are crucial for robots to function in human-centric environments, due to their versatility and potential to enable a large variety of contact-rich tasks, such as in-hand manipulation, complex grasping, and tool use. However, this versatility comes at the price of high dimensional observation and action spaces, complex and discontinuous contact patterns, and under-actuation during nonprehensile manipulation. This makes dexterous manipulation with multi-fingered hands a challenging problem. Dexterous manipulation behaviors with multi-fingered hands have previously been obtained using model-based trajectory optimization methods [31], [24]. However, these methods typically rely on accurate dynamics models and state estimates, which are often difficult to obtain for contact rich manipulation tasks, especially in the real world. Reinforcement learning provides a model agnostic approach that circumvents these issues. Indeed, model-free methods have been used for acquiring manipulation skills [52], [13], but so far have been limited to simpler behaviors with 2-3 finger hands or wholearm manipulators, which do not capture the challenges of highdimensional multi-fingered hands.

demonstration, machine learning, reinforcement learning, (15 more...)

1709.10087

Country:

North America > United States > Washington > King County > Seattle (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots > Manipulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

#artificialintelligenceJun-25-2018, 23:21:58 GMT

Careers

We are on a mission to build human-like intelligence in machines, enabling a future of abundance for all. We thoughtfully and realistically pave the way for a world filled with more powerful and helpful AI systems. Our research spans AGI, deep learning, reinforcement learning, multi-sensory machine perception, dynamic motion planning, AR VR, distributed multi-agent control systems, and much more. We love to create, to build, to play with ideas. We get excited by the change we get to create in the world, regardless of how hard it is and how many say it can't be done.

deep learning, machine learning, reinforcement learning, (1 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

arXiv.org Machine LearningJun-25-2018

Deep Generative Models with Learnable Knowledge Constraints

Hu, Zhiting, Yang, Zichao, Salakhutdinov, Ruslan, Liang, Xiaodan, Qin, Lianhui, Dong, Haoye, Xing, Eric

The broad set of deep generative models (DGMs) has achieved remarkable advances. However, it is often difficult to incorporate rich structured domain knowledge with the end-to-end DGMs. Posterior regularization (PR) offers a principled framework to impose structured constraints on probabilistic models, but has limited applicability to the diverse DGMs that can lack a Bayesian formulation or even explicit density evaluation. PR also requires constraints to be fully specified {\it a priori}, which is impractical or suboptimal for complex knowledge with learnable uncertain parts. In this paper, we establish mathematical correspondence between PR and reinforcement learning (RL), and, based on the connection, expand PR to learn constraints as the extrinsic reward in RL. The resulting algorithm is model-agnostic to apply to any DGMs, and is flexible to adapt arbitrary constraints with the model jointly. Experiments on human image generation and templated sentence generation show models with learned knowledge constraints by our algorithm greatly improve over base generative models.

constraint, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

1806.09764

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.62)

arXiv.org Machine LearningJun-25-2018

A Tour of Reinforcement Learning: The View from Continuous Control

Recht, Benjamin

It surveys the general formulation, terminology, and typical experimental implementations of reinforcement learning and reviews competing solution paradigms. In order to compare the relative merits of various techniques, this survey presents a case study of the Linear Quadratic Regulator (LQR) with unknown dynamics, perhaps the simplest and best studied problem in optimal control. The manuscript describes how merging techniques from learning theory and control can provide non-asymptotic characterizations of LQR performance and shows that these characterizations tend to match experimental behavior. In turn, when revisiting more complex applications, many of the observed phenomena in LQR persist. In particular, theory and experiment demonstrate the role and importance of models and the cost of generality in reinforcement learning algorithms. This survey concludes with a discussion of some of the challenges in designing learning systems that safely and reliably interact with complex and uncertain environments and how tools from reinforcement learning and controls might be combined to approach these challenges.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

1806.0946

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > New York (0.04)
North America > United States > New Hampshire > Hillsborough County > Nashua (0.04)
(4 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry: Information Technology (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Zhang, Amy, Ballas, Nicolas, Pineau, Joelle

A Dissection of Overfitting and Generalization in Continuous Reinforcement Learning

arXiv.org Artificial IntelligenceJun-25-2018

The risks and perils of overfitting in machine learning are well known. However most of the treatment of this, including diagnostic tools and remedies, was developed for the supervised learning case. In this work, we aim to offer new perspectives on the characterization and prevention of overfitting in deep Reinforcement Learning (RL) methods, with a particular focus on continuous domains. We examine several aspects, such as how to define and diagnose overfitting in MDPs, and how to reduce risks by injecting sufficient training diversity. This work complements recent findings on the brittleness of deep RL methods and offers practical observations for RL researchers and practitioners.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

1806.07937

Country:

North America > Canada > Quebec > Montreal (0.04)
North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Fournier, Pierre, Sigaud, Olivier, Chetouani, Mohamed, Oudeyer, Pierre-Yves

Accuracy-based Curriculum Learning in Deep Reinforcement Learning

arXiv.org Artificial IntelligenceJun-25-2018

In this paper, we investigate a new form of automated curriculum learning based on adaptive selection of accuracy requirements, called accuracy-based curriculum learning. Using a reinforcement learning agent based on the Deep Deterministic Policy Gradient algorithm and addressing the Reacher environment, we first show that an agent trained with various accuracy requirements sampled randomly learns more efficiently than when asked to be very accurate at all times. Then we show that adaptive selection of accuracy requirements, based on a local measure of competence progress, automatically generates a curriculum where difficulty progressively increases, resulting in a better learning efficiency than sampling randomly.

curriculum, machine learning, reinforcement learning, (13 more...)

1806.09614

Country:

Europe > France > Île-de-France > Paris > Paris (0.04)
Europe > France > Nouvelle-Aquitaine > Gironde > Bordeaux (0.04)

Genre: Research Report (0.40)

Industry: Education (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)