PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learning

Neural Information Processing Systems 

In other words, the assumptions in these works imply that the state space is already well-explored.