[1606.00511] Large Scale Distributed Hessian-Free Optimization for Deep Neural Network • /r/MachineLearning

Jun-3-2016, 07:40:59 GMT–@machinelearnbot

Training deep neural network is a high dimensional and a highly non-convex optimization problem. Stochastic gradient descent (SGD) algorithm and it's variations are the current state-of-the-art solvers for this task. However, due to non-covexity nature of the problem, it was observed that SGD slows down near saddle point. Recent empirical work claim that by detecting and escaping saddle point efficiently, it's more likely to improve training performance. With this objective, we revisit Hessian-free optimization method for deep networks.

deep learning, hessian-free optimization, machine learning, (3 more...)

@machinelearnbot

Jun-3-2016, 07:40:59 GMT

News Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks > Deep Learning (0.64)
  - Statistical Learning > Gradient Descent (0.62)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found