Towards Characterizing Divergence in Deep Q-Learning