Understanding the Generalization of Stochastic Gradient Adam in Learning Neural Networks