Towards Understanding the Optimization Mechanisms in Deep Learning