Review for NeurIPS paper: A new convergent variant of Q-learning with linear function approximation
–Neural Information Processing Systems
Weaknesses: - While the theoretical results seem correct, it is not clear to me the advantages of this approach compared to previous work, in particular, gradient Q-learning (GQ). On line 110, it is written that the assumptions are not as stringent but I am not convinced that this is the case. Could the authors clarify this point? If I am interpreting it correctly, it assumes that we have a fixed replay buffer of data on which we are doing updates, as in the offline batch RL setting. It is not specified which policy is used to collect this data and I would expect certain assumptions on this behavior policy.
Neural Information Processing Systems
Feb-7-2025, 07:50:35 GMT
- Technology: