On the Convergence Rate of Training Recurrent Neural Networks