Self-Imitation Learning via Generalized Lower Bound Q-learning