Playing hard exploration games by watching YouTube

Aytar, Yusuf, Pfaff, Tobias, Budden, David, Paine, Thomas, Wang, Ziyu, Freitas, Nando de

Feb-14-2020, 11:56:15 GMT–Neural Information Processing Systems

Deep reinforcement learning methods traditionally struggle with tasks where environment rewards are particularly sparse. One successful method of guiding exploration in these domains is to imitate trajectories provided by a human demonstrator. However, these demonstrations are typically collected under artificial conditions, i.e. with access to the agent's exact environment setup and the demonstrator's action and reward trajectories. Here we propose a method that overcomes these limitations in two stages. First, we learn to map unaligned videos from multiple sources to a common representation using self-supervised objectives constructed over both time and modality (i.e.

artificial intelligence, hard exploration game, reinforcement learning, (8 more...)

Neural Information Processing Systems

Feb-14-2020, 11:56:15 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)