major questions raised by the reviewers. 1 Learning rates. To address the reviewers ' comments on learning rates, we will add results with easy-to-implement

Neural Information Processing Systems 

We thank the reviewers for very helpful comments. To address the reviewers' comments on learning rates, we will add results with More specifically, this requires two changes: (1) the epoch length needs to keep increasing (i.e. at the end of every Proof of Theorem 5. We sketch the proof for the piecewise choice (1), which follows easily from our Theorem 1. We will clarify this in the revision. Given that |S||A| is often enormous in practice, our theory potentially leads to a notable improvement. ": See the response above on "learning rates". Q-update and (2) choosing δ to be sufficiently small. We will add this in the revision. We will clarify this in the revision to avoid confusion. ": See the response above on "learning rates".