Review for NeurIPS paper: Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction

Neural Information Processing Systems 

Weaknesses: My main concern about the paper is whether this proposed algorithm is actually implementable due to the specific expression of the (constant) learning rate. I have two concerns: 1. The learning rate depends on t_{mix} in Theorem 1 and on the universal constants c_1 in both Theorem 1 and Theorem 2. How can we compute/approximate t_{mix} in advance? If we cannot, is it sufficient to employ a lower-bound on t_{mix}? Looking at the proofs c_1 is a function of constant c (Equation 55) that in turn derives from Bernstein's inequality (Equation 81) and subsequently \tilde{c} (Equation 84), but its value is never explicitly computed. I am aware that also in [33] the learning rate schedule (that is not constant) depends on \mu_{min} and t_{mix}, but I think the authors should elaborate more on this and explain how to deal with it in practice, if possible.