Reviews: Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Gradient Estimators for Reinforcement Learning
–Neural Information Processing Systems
This paper presents novel methodology in combination with automatic differentiation, that yields unbiased and low-variance estimators of derivatives at any order. It appears potentially to be widely useful, and the exposition is clear to understand. The reviewers and I seem to be in general agreement in liking the paper. Reviewer 1 wrote a thorough review touching on many aspects of the paper. The overall score was 7, and his bottom line positives were: "This paper is well executed: it is well written, technically sound and potentially impactful."
Neural Information Processing Systems
Jan-24-2025, 15:42:45 GMT