Reward-Free Model-Based Reinforcement Learning with Linear Function Approximation

Neural Information Processing Systems 

In the exploration phase, the agent interacts with the environment and collects samples without the reward.