Review for NeurIPS paper: Non-Crossing Quantile Regression for Distributional Reinforcement Learning
–Neural Information Processing Systems
Weaknesses: - Baseline algorithm: While all quantile-based distributional RL algorithms suffer from the crossing quantile issue, QR-DQN is the least affected one since the quantiles are uniformly fixed. IQN[1], which uses randomly sampled quantiles, and FQF[2], which optimizes over chosen quantiles for better distribution approximation, are both expected to suffer much more from crossing quantiles than QR-DQN. While it may be non-trivial to adapt NC architecture to IQN since the quantiles are randommly sampled, it shouldn't be hard to adapt to FQF. Besides, IQN and FQF both have achieved much higher scores than QR-DQN, hence I believe implementing NC architecture on IQN and FQF would greatly strenghthen empirical validations. Can authors explain why only 49 out of 57 games are used for evaluation? - Number of quantiles: I believe that N 100 quantiles is a reasonable choice.
Neural Information Processing Systems
Jan-27-2025, 20:55:50 GMT
- Technology: