Explicit Explore-Exploit Algorithms in Continuous State Spaces

Neural Information Processing Systems 

What is a good algorithm for systematically exploring an environment for the purpose of reinforcement learning?