Exploration in Structured Reinforcement Learning

Jungseul Ok, Alexandre Proutiere, Damianos Tranos

Neural Information Processing Systems 

Hence, with largestate and action spaces, it is essential to identify and exploit any possible structure existing in the system dynamics and reward function so as to minimize exploration phases and in turn reduce regret to reasonable values. Modern RL algorithms actually implicitly impose some structural properties either in the model parameters (transition probabilities and reward function, see e.g.