Why Learning of Large-Scale Neural Networks Behaves Like Convex Optimization

Open in new window