Convergent Block Coordinate Descent for Training Tikhonov Regularized Deep Neural Networks

Nov-20-2017–arXiv.org Machine Learning

By lifting the ReLU function into a higher dimensional space, we develop a smooth multi-convex formulation for training feed-forward deep neural networks (DNNs). This allows us to develop a block coordinate descent (BCD) training algorithm consisting of a sequence of numerically well-behaved convex optimizations. Using ideas from proximal point methods in convex analysis, we prove that this BCD algorithm will converge globally to a stationary point with R-linear convergence rate of order one. In experiments with the MNIST database, DNNs trained with this BCD algorithm consistently yielded better test-set error rates than identical DNN architectures trained via all the stochastic gradient descent (SGD) variants in the Caffe toolbox.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Machine Learning

Nov-20-2017

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.28)

Genre:
- Research Report (0.64)
- Instructional Material (0.47)

Industry:
- Education (0.46)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks > Deep Learning (0.86)
  - Statistical Learning > Gradient Descent (0.55)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found