Reviews: Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Gradient Estimators for Reinforcement Learning

Neural Information Processing Systems 

This paper presents novel methodology in combination with automatic differentiation, that yields unbiased and low-variance estimators of derivatives at any order. It appears potentially to be widely useful, and the exposition is clear to understand. The reviewers and I seem to be in general agreement in liking the paper. Reviewer 1 wrote a thorough review touching on many aspects of the paper. The overall score was 7, and his bottom line positives were: "This paper is well executed: it is well written, technically sound and potentially impactful."