Active learning strategies respond to the costly labelling task in a supervised classification by selecting the most useful unlabelled examples in training a predictive model. Many conventional active learning algorithms focus on refining the decision boundary, rather than exploring new regions that can be more informative. In this setting, we propose a sequential algorithm named EG-Active that can improve any Active learning algorithm by an optimal random exploration. Experimental results show a statistically significant and appreciable improvement in the performance of our new approach over the existing active feedback methods.
We show how an ensemble of $Q^*$-functions can be leveraged for more effective exploration in deep reinforcement learning. We build on well established algorithms from the bandit setting, and adapt them to the $Q$-learning setting. We propose an exploration strategy based on upper-confidence bounds (UCB). Our experiments show significant gains on the Atari benchmark.
We introduce an online active exploration algorithm for data-efficiently learning an abstract symbolic model of an environment. Our algorithm is divided into two parts: the first part quickly generates an intermediate Bayesian symbolic model from the data that the agent has collected so far, which the agent can then use along with the second part to guide its future exploration towards regions of the state space that the model is uncertain about. We show that our algorithm outperforms random and greedy exploration policies on two different computer game domains. The first domain is an Asteroids-inspired game with complex dynamics but basic logical structure. The second is the Treasure Game, with simpler dynamics but more complex logical structure.
Many real-valued connectionist approaches to learning control realize exploration by randomness inaction selection. This might be disadvantageous when costs are assigned to "negative experiences" . The basic idea presented in this paper is to make an agent explore unknown regions in a more directed manner. This is achieved by a so-called competence map, which is trained to predict the controller's accuracy, and is used for guiding exploration. Based on this, a bistable system enables smoothly switching attention between two behaviors - exploration and exploitation - depending on expected costsand knowledge gain. The appropriateness of this method is demonstrated by a simple robot navigation task.
This paper provides an empirical evaluation of recently developed exploration algorithms within the Arcade Learning Environment (ALE). We study the use of different reward bonuses that incentives exploration in reinforcement learning. We do so by fixing the learning algorithm used and focusing only on the impact of the different exploration bonuses in the agent's performance. We use Rainbow, the state-of-the-art algorithm for value-based agents, and focus on some of the bonuses proposed in the last few years. We consider the impact these algorithms have on performance within the popular game Montezuma's Revenge which has gathered a lot of interest from the exploration community, across the the set of seven games identified by Bellemare et al. (2016) as challenging for exploration, and easier games where exploration is not an issue. We find that, in our setting, recently developed bonuses do not provide significantly improved performance on Montezuma's Revenge or hard exploration games. We also find that existing bonus-based methods may negatively impact performance on games in which exploration is not an issue and may even perform worse than $\epsilon$-greedy exploration.