Why Learning of Large-Scale Neural Networks Behaves Like Convex Optimization