Review for NeurIPS paper: An Improved Analysis of (Variance-Reduced) Policy Gradient and Natural Policy Gradient Methods

Jan-24-2025, 13:09:12 GMT–Neural Information Processing Systems

Additional Feedback: Detailed comments on a section-by-section basis are given below. Lines 10-13 of abstract: I found this quite vaguely worded, and didn't understand which contributions in the main paper this is referring to. Section 1 "Policy gradients are usually estimated via Monte-Carlo rollouts". To me, this would suggest that critics are not used, or are used at most as baselines. However, as far as I am aware, most of the successful methods cited in the previous paragraph do use bootstrapping, and so are not pure Monte Carlo, in the sense that the term is used in RL.

improved analysis, natural policy gradient method, variance-reduced, (13 more...)

Neural Information Processing Systems

Jan-24-2025, 13:09:12 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.36)