Review for NeurIPS paper: An Improved Analysis of (Variance-Reduced) Policy Gradient and Natural Policy Gradient Methods
–Neural Information Processing Systems
Additional Feedback: Detailed comments on a section-by-section basis are given below. Lines 10-13 of abstract: I found this quite vaguely worded, and didn't understand which contributions in the main paper this is referring to. Section 1 "Policy gradients are usually estimated via Monte-Carlo rollouts". To me, this would suggest that critics are not used, or are used at most as baselines. However, as far as I am aware, most of the successful methods cited in the previous paragraph do use bootstrapping, and so are not pure Monte Carlo, in the sense that the term is used in RL.
Neural Information Processing Systems
Jan-24-2025, 13:09:12 GMT
- Technology: