Imitating Deep Learning Dynamics via Locally Elastic Stochastic Differential Equations

Neural Information Processing Systems 

Understanding the training dynamics of deep learning models is perhaps a necessary step toward demystifying the effectiveness of these models. In particular, how do training data from different classes gradually become separable in their feature spaces when training neural networks using stochastic gradient descent?