Deep Exploration via Bootstrapped DQN

Neural Information Processing Systems

Efficient exploration remains a major challenge for reinforcement learning (RL). Common dithering strategies for exploration, such as epsilon-greedy, do not carry out temporally-extended (or deep) exploration; this can lead to exponentially larger data requirements. However, most algorithms for statistically efficient RL are not computationally tractable in complex environments. Randomized value functions offer a promising approach to efficient exploration with generalization, but existing algorithms are not compatible with nonlinearly parameterized value functions. As a first step towards addressing such contexts we develop bootstrapped DQN.


Benchmarking Bonus-Based Exploration Methods on the Arcade Learning Environment

arXiv.org Machine Learning

This paper provides an empirical evaluation of recently developed exploration algorithms within the Arcade Learning Environment (ALE). We study the use of different reward bonuses that incentives exploration in reinforcement learning. We do so by fixing the learning algorithm used and focusing only on the impact of the different exploration bonuses in the agent's performance. We use Rainbow, the state-of-the-art algorithm for value-based agents, and focus on some of the bonuses proposed in the last few years. We consider the impact these algorithms have on performance within the popular game Montezuma's Revenge which has gathered a lot of interest from the exploration community, across the the set of seven games identified by Bellemare et al. (2016) as challenging for exploration, and easier games where exploration is not an issue. We find that, in our setting, recently developed bonuses do not provide significantly improved performance on Montezuma's Revenge or hard exploration games. We also find that existing bonus-based methods may negatively impact performance on games in which exploration is not an issue and may even perform worse than $\epsilon$-greedy exploration.


'Astroneer' brings space exploration to Xbox and PC on February 6th

Engadget

Space exploration game Astroneer has racked up a major following in its Early Access stage. After hinting at it, the game finally has an official release date. The title from System Era Softworks will be available for Xbox One and Windows 10 starting on February 6th, 2019. It will run you $29.99 at launch. If you aren't one of the two million players to take a crack at the early stages version of Astroneer, here's what to expect: The interplanetary survival sandbox game is a sort of mix of Minecraft and fellow space exploration game No Man's Sky.


Searching for Optimal Off-Line Exploration Paths in Grid Environments for a Robot with Limited Visibility

AAAI Conferences

Robotic exploration is an on-line problem in which autonomous mobile robots incrementally discover and map the physical structure of initially unknown environments. Usually, the performance of exploration strategies used to decide where to go next is not compared against the optimal performance obtainable in the test environments, because the latter is generally unknown. In this paper, we present a method to calculate an approximation of the optimal (shortest) exploration path in an arbitrary environment. We consider a mobile robot with limited visibility, discretize a two-dimensional environment with a regular grid, and formulate a search problem for finding the optimal exploration path in the grid, which is solved using A*. Experimental results show the viability of our approach for realistically large environments and its potential for better assessing the performance of on-line exploration strategies.


Model-Based Active Exploration

arXiv.org Artificial Intelligence

Efficient exploration is an unsolved problem in Reinforcement Learning. We introduce Model-Based Active eXploration (MAX), an algorithm that actively explores the environment. It minimizes data required to comprehensively model the environment by planning to observe novel events, instead of merely reacting to novelty encountered by chance. Non-stationarity induced by traditional exploration bonus techniques is avoided by constructing fresh exploration policies only at time of action. In semi-random toy environments where directed exploration is critical to make progress, our algorithm is at least an order of magnitude more efficient than strong baselines.