This paper provides an empirical evaluation of recently developed exploration algorithms within the Arcade Learning Environment (ALE). We study the use of different reward bonuses that incentives exploration in reinforcement learning. We do so by fixing the learning algorithm used and focusing only on the impact of the different exploration bonuses in the agent's performance. We use Rainbow, the state-of-the-art algorithm for value-based agents, and focus on some of the bonuses proposed in the last few years. We consider the impact these algorithms have on performance within the popular game Montezuma's Revenge which has gathered a lot of interest from the exploration community, across the the set of seven games identified by Bellemare et al. (2016) as challenging for exploration, and easier games where exploration is not an issue. We find that, in our setting, recently developed bonuses do not provide significantly improved performance on Montezuma's Revenge or hard exploration games. We also find that existing bonus-based methods may negatively impact performance on games in which exploration is not an issue and may even perform worse than $\epsilon$-greedy exploration.
Space exploration game Astroneer has racked up a major following in its Early Access stage. After hinting at it, the game finally has an official release date. The title from System Era Softworks will be available for Xbox One and Windows 10 starting on February 6th, 2019. It will run you $29.99 at launch. If you aren't one of the two million players to take a crack at the early stages version of Astroneer, here's what to expect: The interplanetary survival sandbox game is a sort of mix of Minecraft and fellow space exploration game No Man's Sky.
Robotic exploration is an on-line problem in which autonomous mobile robots incrementally discover and map the physical structure of initially unknown environments. Usually, the performance of exploration strategies used to decide where to go next is not compared against the optimal performance obtainable in the test environments, because the latter is generally unknown. In this paper, we present a method to calculate an approximation of the optimal (shortest) exploration path in an arbitrary environment. We consider a mobile robot with limited visibility, discretize a two-dimensional environment with a regular grid, and formulate a search problem for finding the optimal exploration path in the grid, which is solved using A*. Experimental results show the viability of our approach for realistically large environments and its potential for better assessing the performance of on-line exploration strategies.
Efficient exploration is an unsolved problem in Reinforcement Learning. We introduce Model-Based Active eXploration (MAX), an algorithm that actively explores the environment. It minimizes data required to comprehensively model the environment by planning to observe novel events, instead of merely reacting to novelty encountered by chance. Non-stationarity induced by traditional exploration bonus techniques is avoided by constructing fresh exploration policies only at time of action. In semi-random toy environments where directed exploration is critical to make progress, our algorithm is at least an order of magnitude more efficient than strong baselines.
For many important application domains, such as industrial inspection or search and rescue, this task is further challenged from the fact that such operations often have to take place in GPS-denied environments and possibly visually-degraded conditions. In this work, we move away from deterministic approaches on autonomous exploration and we propose a localization uncertainty-aware autonomous receding horizon exploration and mapping planner verified using aerial robots. This planner follows a two-step optimization paradigm. At first, in an online computed random tree the algorithm finds a finite-horizon branch that optimizes the amount of space expected to be explored. The first viewpoint configuration of this branch is selected, but the path towards it is decided through a second planning step.