Exploration Bonus for Regret Minimization in Discrete and Continuous Average Reward MDPs

Jian QIAN, Ronan Fruit, Matteo Pirotta, Alessandro Lazaric

Neural Information Processing Systems 

The exploration bonus is an effective approach to manage the explorationexploitation trade-offinMarkovDecision Processes (MDPs).