Provably Efficient Model-free RL in Leader-Follower MDP with Linear Function Approximation

Open in new window