Quantile QT-Opt for Risk-Aware Vision-Based Robotic Grasping

Bodnar, Cristian, Li, Adrian, Hausman, Karol, Pastor, Peter, Kalakrishnan, Mrinal

Oct-1-2019–arXiv.org Machine Learning

Quantile QT -Opt for Risk-A ware Vision-Based Robotic Grasping Cristian Bodnar 1, Adrian Li 2, Karol Hausman 3, Peter Pastor 2, Mrinal Kalakrishnan 2 Abstract -- The distributional perspective on reinforcement learning (RL) has given rise to a series of successful Q-learning algorithms, resulting in state-of-the-art performance in arcade game environments. However, it has not yet been analyzed how these findings from a discrete setting translate to complex practical applications characterized by noisy, high dimensional and continuous state-action spaces. In this work, we propose Quantile QT -Opt (Q2-Opt), a distributional variant of the recently introduced distributed Q-learning algorithm [11] for continuous domains, and examine its behaviour in a series of simulated and real vision-based robotic grasping tasks. The absence of an actor in Q2-Opt allows us to directly draw a parallel to the previous discrete experiments in the literature without the additional complexities induced by an actor-critic architecture. We demonstrate that Q2-Opt achieves a superior vision-based object grasping success rate, while also being more sample efficient. The distributional formulation also allows us to experiment with various risk-distortion metrics that give us an indication of how robots can concretely manage risk in practice using a Deep RL control policy. As an additional contribution, we perform experiments on offline datasets and compare them with the latest findings from discrete settings. Surprisingly, we find that there is a discrepancy between our results and the previous batch RL findings from the literature obtained on arcade game environments. I. INTRODUCTION The new distributional perspective on RL has produced a novel class of Deep Q-learning methods that learn a distribution over the state-action returns, instead of using the expectation given by the traditional value function.

artificial intelligence, computer game, qt -opt, (19 more...)

arXiv.org Machine Learning

Oct-1-2019

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.28)

Genre:
- Research Report > New Finding (0.66)

Industry:
- Leisure & Entertainment > Games > Computer Games (0.95)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Reinforcement Learning (1.00)
  - Robots (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found