Quantile QT-Opt for Risk-Aware Vision-Based Robotic Grasping
Bodnar, Cristian, Li, Adrian, Hausman, Karol, Pastor, Peter, Kalakrishnan, Mrinal
Quantile QT -Opt for Risk-A ware Vision-Based Robotic Grasping Cristian Bodnar 1, Adrian Li 2, Karol Hausman 3, Peter Pastor 2, Mrinal Kalakrishnan 2 Abstract -- The distributional perspective on reinforcement learning (RL) has given rise to a series of successful Q-learning algorithms, resulting in state-of-the-art performance in arcade game environments. However, it has not yet been analyzed how these findings from a discrete setting translate to complex practical applications characterized by noisy, high dimensional and continuous state-action spaces. In this work, we propose Quantile QT -Opt (Q2-Opt), a distributional variant of the recently introduced distributed Q-learning algorithm [11] for continuous domains, and examine its behaviour in a series of simulated and real vision-based robotic grasping tasks. The absence of an actor in Q2-Opt allows us to directly draw a parallel to the previous discrete experiments in the literature without the additional complexities induced by an actor-critic architecture. We demonstrate that Q2-Opt achieves a superior vision-based object grasping success rate, while also being more sample efficient. The distributional formulation also allows us to experiment with various risk-distortion metrics that give us an indication of how robots can concretely manage risk in practice using a Deep RL control policy. As an additional contribution, we perform experiments on offline datasets and compare them with the latest findings from discrete settings. Surprisingly, we find that there is a discrepancy between our results and the previous batch RL findings from the literature obtained on arcade game environments. I. INTRODUCTION The new distributional perspective on RL has produced a novel class of Deep Q-learning methods that learn a distribution over the state-action returns, instead of using the expectation given by the traditional value function.
Oct-1-2019
- Country:
- North America > United States (0.28)
- Genre:
- Research Report > New Finding (0.66)
- Industry:
- Leisure & Entertainment > Games > Computer Games (0.95)
- Technology: