Review for NeurIPS paper: Self-Imitation Learning via Generalized Lower Bound Q-learning

Neural Information Processing Systems 

Weaknesses: The performance improvement is incremental and needs to be further evaluated. For example, each experiment should be conducted over 5 random seeds, instead of 3 seeds, for a more accurate comparison. Besides, in only 3 out of 8 environments, shown in Figure 2, the proposed method shows clear improvement. And more baseline methods should be considered, such as SAC. So, how does the generalise SIL compare to SIL in the Montezuma's Revenge task?