Review for NeurIPS paper: Self-Imitation Learning via Generalized Lower Bound Q-learning

Neural Information Processing Systems 

The author response provided satisfactory answers to the concerns of the reviewers with respect to contraction/bias tradeoff, disconnect between the experimental results and theory, and variance of the estimator. This lead one reviewer to increase their score for this paper, which already had reasonably solid scores.