Review for NeurIPS paper: A new convergent variant of Q-learning with linear function approximation

Feb-7-2025, 07:50:35 GMT–Neural Information Processing Systems

Weaknesses: - While the theoretical results seem correct, it is not clear to me the advantages of this approach compared to previous work, in particular, gradient Q-learning (GQ). On line 110, it is written that the assumptions are not as stringent but I am not convinced that this is the case. Could the authors clarify this point? If I am interpreting it correctly, it assumes that we have a fixed replay buffer of data on which we are doing updates, as in the offline batch RL setting. It is not specified which policy is used to collect this data and I would expect certain assumptions on this behavior policy.

algorithm, linear function approximation, new convergent variant, (6 more...)

Neural Information Processing Systems

Feb-7-2025, 07:50:35 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Reinforcement Learning (1.00)
  - Representation & Reasoning > Uncertainty
    - Fuzzy Logic (0.40)