Boosting Soft Q-Learning by Bounding