Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs

Brutzkus, Alon, Globerson, Amir

arXiv.org Machine Learning 

Deep neural networks have achieved state-of-the-art performance on many machine learning tasks in areas such as natural language processing (Wu et al., 2016), computer vision (Krizhevsky et al., 2012) and speech recognition (Hinton et al., 2012). Training of such networks is often successfully performed by minimizing a high-dimensional non-convex objective function, using simple first-order methods such as stochastic gradient descent. Nonetheless, the success of deep learning from an optimization perspective is poorly understood theoretically. Current results are mostly pessimistic, suggesting that even training a 3-node neural network is NPhard (Blum & Rivest, 1993), and that the objective function of a single neuron can admit exponentially many local minima (Auer et al., 1996; Safran & Shamir, 2016). There have been recent attempts to bridge this gap between theory and practice.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found