Convergence and stability of Q-learning in Hierarchical Reinforcement Learning