Reviews: Explicit Planning for Efficient Exploration in Reinforcement Learning

Neural Information Processing Systems 

This paper introduces the interesting idea of demand matrices to more efficiently do pure exploration. Demand matrices simply specific the minimum number of times needed to visit every state-action pair. This is then treated as an additional part of the state in an augmented MDP, which can then be solved to derive the optimal exploration strategy to achieve the specified initial demand. While the idea is interesting and solid, there are downsides to the idea itself and some of the analysis in this paper that could be improved upon. There are no theoretical guarantees that using this algorithm with a learned model at the same time will work.