Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs

Feb-25-2017–arXiv.org Machine Learning

Deep neural networks have achieved state-of-the-art performance on many machine learning tasks in areas such as natural language processing (Wu et al., 2016), computer vision (Krizhevsky et al., 2012) and speech recognition (Hinton et al., 2012). Training of such networks is often successfully performed by minimizing a high-dimensional non-convex objective function, using simple first-order methods such as stochastic gradient descent. Nonetheless, the success of deep learning from an optimization perspective is poorly understood theoretically. Current results are mostly pessimistic, suggesting that even training a 3-node neural network is NPhard (Blum & Rivest, 1993), and that the objective function of a single neuron can admit exponentially many local minima (Auer et al., 1996; Safran & Shamir, 2016). There have been recent attempts to bridge this gap between theory and practice.

artificial intelligence, gradient descent, machine learning, (16 more...)

arXiv.org Machine Learning

Feb-25-2017

arXiv.org PDF

Add feedback

Genre:
- Research Report > New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Optimization (1.00)
  - Machine Learning
    - Statistical Learning > Gradient Descent (1.00)
    - Neural Networks > Deep Learning (0.87)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found