Reviews: Two Time-scale Off-Policy TD Learning: Non-asymptotic Analysis over Markovian Samples

Neural Information Processing Systems 

All the reviewers recommended acceptance and, after consideration by the Senior AC and the Program Chairs, a recommendation for Accept (Poster) was settled on.