Review for NeurIPS paper: Self-Imitation Learning via Generalized Lower Bound Q-learning

Jan-27-2025, 03:52:26 GMT–Neural Information Processing Systems

Weaknesses: The performance improvement is incremental and needs to be further evaluated. For example, each experiment should be conducted over 5 random seeds, instead of 3 seeds, for a more accurate comparison. Besides, in only 3 out of 8 environments, shown in Figure 2, the proposed method shows clear improvement. And more baseline methods should be considered, such as SAC. So, how does the generalise SIL compare to SIL in the Montezuma's Revenge task?

generalized lower bound q-learning, neurips paper, self-imitation learning, (2 more...)

Neural Information Processing Systems

Jan-27-2025, 03:52:26 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.76)