e7c573c14a09b84f6b7782ce3965f335-AuthorFeedback.pdf

Neural Information Processing Systems 

Reviewer 1: Q: Compare and highlight new challenges of analyzing TDC relative to other GTD algorithms in [30,34]. A: Comparing to conventional optimization with stagewise stepsize, here we need to handle the bias induced by non-i.i.d. Q: How the theoretical guarantees can be affected in the non-asymptotic analysis of actor-critic and gradient Q-learning. Q: How would worst-case errors predicted by the bound compare to errors observed empirically in experiments. It is a good idea to plot theoretical errors and compare with empirical bounds.