e7c573c14a09b84f6b7782ce3965f335-AuthorFeedback.pdf
–Neural Information Processing Systems
Reviewer 1: Q: Compare and highlight new challenges of analyzing TDC relative to other GTD algorithms in [30,34]. A: Comparing to conventional optimization with stagewise stepsize, here we need to handle the bias induced by non-i.i.d. Q: How the theoretical guarantees can be affected in the non-asymptotic analysis of actor-critic and gradient Q-learning. Q: How would worst-case errors predicted by the bound compare to errors observed empirically in experiments. It is a good idea to plot theoretical errors and compare with empirical bounds.
Neural Information Processing Systems
Jun-1-2025, 21:54:03 GMT
- Technology: