Review for NeurIPS paper: Implicit Distributional Reinforcement Learning
–Neural Information Processing Systems
Weaknesses: Some decisions in the paper are not well motivated, and despite the extensive set of ablations the importance of some choices remains unclear. There are really two separate methodological improvements proposed in this paper: the implicit distributional value function and the semi-implicit policy. These two components might have been better off proposed separately so that they could be studied in more detail. One paper could propose the implicit parameterization of the distributional value function and compare its results to C51 and QR-DQN, while another used a standard expected-value critic with the semi-implicit policy and evaluated in detail the impact of the policy parameterization compared to Gaussian, mixture of Gaussian, and normalizing flow policies. Further complicating matters, there are a lot of bells and whistles in the final method (twin delayed critics, learned temperature, etc).
Neural Information Processing Systems
Jan-24-2025, 08:03:47 GMT
- Technology: