$Q$-learning with Logarithmic Regret

Open in new window