Distributed Hessian-Free Optimization for Deep Neural Network

He, Xi (Lehigh University) | Mudigere, Dheevatssa (Intel Labs, India) | Smelyanskiy, Mikhail (Intel Labs, SC) | Takac, Martin (Lehigh University)

Feb-4-2017–AAAI Conferences

Training deep neural network is a high dimensional and a highly non-convex optimization problem. In this paper, we revisit Hessian-free optimization method for deep networks with negative curvature direction detection. We also develop its distributed variant and demonstrate superior scaling potential to SGD, which allows more efficiently utilizing larger computing resources thus enabling large models and faster time to obtain desired solution. We show that these techniques accelerate the training process for both the standard MNIST dataset and also the TIMIT speech recognition problem, demonstrating robust performance with upto an order of magnitude larger batch sizes. This increased scaling potential is illustrated with near linear speed-up on upto 32 CPU nodes for a simple 4-layer network.

deep learning, iteration, neural network, (19 more...)

AAAI Conferences

Feb-4-2017

Conferences PDF

Add feedback

Country:
- North America > United States (0.28)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.85)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found