Can SGD Learn Recurrent Neural Networks with Provable Generalization?