$\sqrt{n}$-Regret for Learning in Markov Decision Processes with Function Approximation and Low Bellman Rank

Open in new window