Q-learning with Uniformly Bounded Variance: Large Discounting is Not a Barrier to Fast Learning

Open in new window