A Survey of Optimization Methods for Training DL Models: Theoretical Perspective on Convergence and Generalization