Review for NeurIPS paper: Self-Imitation Learning via Generalized Lower Bound Q-learning
–Neural Information Processing Systems
Weaknesses: The performance improvement is incremental and needs to be further evaluated. For example, each experiment should be conducted over 5 random seeds, instead of 3 seeds, for a more accurate comparison. Besides, in only 3 out of 8 environments, shown in Figure 2, the proposed method shows clear improvement. And more baseline methods should be considered, such as SAC. So, how does the generalise SIL compare to SIL in the Montezuma's Revenge task?
Neural Information Processing Systems
Jan-27-2025, 03:52:26 GMT
- Technology: