On the Computational Efficiency of Training Neural Networks