AITopics

1906.07516

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Machine LearningFeb-18-2019

Simultaneously Learning Vision and Feature-based Control Policies for Real-world Ball-in-a-Cup

Schwab, Devin, Springenberg, Tobias, Martins, Murilo F., Lampe, Thomas, Neunert, Michael, Abdolmaleki, Abbas, Hertweck, Tim, Hafner, Roland, Nori, Francesco, Riedmiller, Martin

We present a method for fast training of vision based control policies on real robots. The key idea behind our method is to perform multi-task Reinforcement Learning with auxiliary tasks that differ not only in the reward to be optimized but also in the state-space in which they operate. In particular, we allow auxiliary task policies to utilize task features that are available only at training-time. This allows for fast learning of auxiliary policies, which subsequently generate good data for training the main, vision-based control policies. This method can be seen as an extension of the Scheduled Auxiliary Control (SAC-X) framework. We demonstrate the efficacy of our method by using both a simulated and real-world Ball-in-a-Cup game controlled by a robot arm. In simulation, our approach leads to significant learning speed-ups when compared to standard SAC-X. On the real robot we show that the task can be learned from-scratch, i.e., with no transfer from simulation and no imitation learning. Videos of our learned policies running on the real robot can be found at https://sites.google.com/view/rss-2019-sawyer-bic/.

deep learning, experiment, neural network, (17 more...)

1902.04706

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)

arXiv.org Artificial IntelligenceJan-3-2019

Self-supervised Learning of Image Embedding for Continuous Control

Florensa, Carlos, Degrave, Jonas, Heess, Nicolas, Springenberg, Jost Tobias, Riedmiller, Martin

Operating directly from raw high dimensional sensory inputs like images is still a challenge for robotic control. Recently, Reinforcement Learning methods have been proposed to solve specific tasks end-to-end, from pixels to torques. However, these approaches assume the access to a specified reward which may require specialized instrumentation of the environment. Furthermore, the obtained policy and representations tend to be task specific and may not transfer well. In this work we investigate completely self-supervised learning of a general image embedding and control primitives, based on finding the shortest time to reach any state. We also introduce a new structure for the state-action value function that builds a connection between model-free and model-based methods, and improves the performance of the learning algorithm. We experimentally demonstrate these findings in three simulated robotic tasks.

artificial intelligence, neural network, representation, (16 more...)

1901.00943

Country:

North America > United States > California (0.14)
Europe > Sweden (0.14)

Genre: Research Report (0.41)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Machine LearningDec-5-2018

Relative Entropy Regularized Policy Iteration

Abdolmaleki, Abbas, Springenberg, Jost Tobias, Degrave, Jonas, Bohez, Steven, Tassa, Yuval, Belov, Dan, Heess, Nicolas, Riedmiller, Martin

We present an off-policy actor-critic algorithm for Reinforcement Learning (RL) that combines ideas from gradient-free optimization via stochastic search with learned action-value function. The result is a simple procedure consisting of three steps: i) policy evaluation by estimating a parametric action-value function; ii) policy improvement via the estimation of a local non-parametric policy; and iii) generalization by fitting a parametric policy. Each step can be implemented in different ways, giving rise to several algorithm variants. Our algorithm draws on connections to existing literature on black-box optimization and 'RL as an inference' and it can be seen either as an extension of the Maximum a Posteriori Policy Optimisation algorithm (MPO) [Abdolmaleki et al., 2018a], or as an extension of Trust Region Covariance Matrix Adaptation Evolutionary Strategy (CMA-ES) [Abdolmaleki et al., 2017b; Hansen et al., 1997] to a policy iteration scheme. Our comparison on 31 continuous control tasks from parkour suite [Heess et al., 2017], DeepMind control suite [Tassa et al., 2018] and OpenAI Gym [Brockman et al., 2016] with diverse properties, limited amount of compute and a single set of hyperparameters, demonstrate the effectiveness of our method and the state of art results. Videos, summarizing results, can be found at goo.gl/HtvJKR .

deep learning, neural network, optimization problem, (20 more...)

1812.02256

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Leisure & Entertainment (0.46)
Transportation (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

arXiv.org Artificial IntelligenceOct-8-2018

Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards

Vecerik, Mel, Hester, Todd, Scholz, Jonathan, Wang, Fumin, Pietquin, Olivier, Piot, Bilal, Heess, Nicolas, Rothörl, Thomas, Lampe, Thomas, Riedmiller, Martin

We propose a general and model-free approach for Reinforcement Learning (RL) on real robotics with sparse rewards. We build upon the Deep Deterministic Policy Gradient (DDPG) algorithm to use demonstrations. Both demonstrations and actual interactions are used to fill a replay buffer and the sampling ratio between demonstrations and transitions is automatically tuned via a prioritized replay mechanism. Typically, carefully engineered shaping rewards are required to enable the agents to efficiently explore on high dimensional control problems such as robotics. They are also required for model-based acceleration methods relying on local solvers such as iLQG (e.g. Guided Policy Search and Normalized Advantage Function). The demonstrations replace the need for carefully engineered rewards, and reduce the exploration problem encountered by classical RL approaches in these domains. Demonstrations are collected by a robot kinesthetically force-controlled by a human demonstrator. Results on four simulated insertion tasks show that DDPG from demonstrations out-performs DDPG, and does not require engineered rewards. Finally, we demonstrate the method on a real robotics task consisting of inserting a clip (flexible object) into a rigid object.

artificial intelligence, demonstration, reinforcement learning, (18 more...)

1707.08817

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceJun-14-2018

Maximum a Posteriori Policy Optimisation

Abdolmaleki, Abbas, Springenberg, Jost Tobias, Tassa, Yuval, Munos, Remi, Heess, Nicolas, Riedmiller, Martin

We introduce a new algorithm for reinforcement learning called Maximum aposteriori Policy Optimisation (MPO) based on coordinate ascent on a relative entropy objective. We show that several existing methods can directly be related to our derivation. We develop two off-policy algorithms and demonstrate that they are competitive with the state-of-the-art in deep reinforcement learning. In particular, for continuous control, our method outperforms existing methods with respect to sample efficiency, premature convergence and robustness to hyperparameter settings while achieving similar or better final performance.

algorithm, artificial intelligence, reinforcement learning, (17 more...)

1806.0692

Country: North America (0.46)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Machine LearningJun-4-2018

Graph networks as learnable physics engines for inference and control

Sanchez-Gonzalez, Alvaro, Heess, Nicolas, Springenberg, Jost Tobias, Merel, Josh, Riedmiller, Martin, Hadsell, Raia, Battaglia, Peter

Understanding and interacting with everyday physical scenes requires rich knowledge about the structure of the world, represented either implicitly in a value or policy function, or explicitly in a transition model. Here we introduce a new class of learnable models--based on graph networks--which implement an inductive bias for object- and relation-centric representations of complex, dynamical systems. Our results show that as a forward model, our approach supports accurate predictions from real and simulated data, and surprisingly strong and efficient generalization, across eight distinct physical systems which we varied parametrically and structurally. We also found that our inference model can perform system identification. Our models are also differentiable, and support online planning via gradient-based trajectory optimization, as well as offline policy optimization. Our framework offers new opportunities for harnessing and exploiting rich knowledge about the world, and takes a key step toward building machines with more human-like representations of the world.

deep learning, graph, neural network, (19 more...)

1806.01242

Country:

Europe > United Kingdom (0.14)
Europe > Sweden (0.14)

Genre: Research Report > New Finding (0.86)

Industry: Energy > Oil & Gas (0.48)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

arXiv.org Machine LearningFeb-28-2018

Learning by Playing - Solving Sparse Reward Tasks from Scratch

Riedmiller, Martin, Hafner, Roland, Lampe, Thomas, Neunert, Michael, Degrave, Jonas, Van de Wiele, Tom, Mnih, Volodymyr, Heess, Nicolas, Springenberg, Jost Tobias

We propose Scheduled Auxiliary Control (SAC-X), a new learning paradigm in the context of Reinforcement Learning (RL). SAC-X enables learning of complex behaviors - from scratch - in the presence of multiple sparse reward signals. To this end, the agent is equipped with a set of general auxiliary tasks, that it attempts to learn simultaneously via off-policy RL. The key idea behind our method is that active (learned) scheduling and execution of auxiliary policies allows the agent to efficiently explore its environment - enabling it to excel at sparse reward RL. Our experiments in several challenging robotic manipulation settings demonstrate the power of our approach.

deep learning, experiment, neural network, (18 more...)

1802.10567

Country: North America > United States > California (0.14)

Genre: Research Report (0.64)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)