Depth and nonlinearity induce implicit exploration for RL
Dauparas, Justas, Tomioka, Ryota, Hofmann, Katja
–arXiv.org Artificial Intelligence
Reinforcement learning (RL) is a systematic approach to learning in sequential decision problems, where a learners' future task performance depends on its past actions. In such settings, learners have to explore, meaning they have to take actions with uncertain outcomes, to facilitate learning about the consequences of such actions. The question of how to best explore is a key open question in RL. Here, we specifically address this question from an empirical perspective, and investigate how to explore in a way that leads to sample efficient learning in deep RL, i.e., reinforcement learning with value function approximators that are parameterized as powerful neural networks. We present a surprising finding: in this setting, good approximate value functions can be learned without any explicit exploration. In fact, we find that in several cases learning without explicit exploration is equally or more sample efficient than the most-commonly used ɛ-greedy exploration scheme on several standard benchmark tasks. We present additional results that suggest a likely role of model structure (network depth and nonlinearity) in inducing such implicit exploration. We believe that our insights have strong practical implications and open up a novel line of research towards understanding exploration in deep RL.
arXiv.org Artificial Intelligence
May-29-2018
- Country:
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Genre:
- Research Report (0.40)
- Technology: