Goto

Collaborating Authors

 convergent block coordinate descent


Convergent Block Coordinate Descent for Training Tikhonov Regularized Deep Neural Networks

Neural Information Processing Systems

By lifting the ReLU function into a higher dimensional space, we develop a smooth multi-convex formulation for training feed-forward deep neural networks (DNNs). This allows us to develop a block coordinate descent (BCD) training algorithm consisting of a sequence of numerically well-behaved convex optimizations. Using ideas from proximal point methods in convex analysis, we prove that this BCD algorithm will converge globally to a stationary point with R-linear convergence rate of order one. In experiments with the MNIST database, DNNs trained with this BCD algorithm consistently yielded better test-set error rates than identical DNN architectures trained via all the stochastic gradient descent (SGD) variants in the Caffe toolbox.


Reviews: Convergent Block Coordinate Descent for Training Tikhonov Regularized Deep Neural Networks

Neural Information Processing Systems

This paper proposes a simple and efficient block coordinate descent (BCD) algorithm with a novel Tikhonov regularization for training both dense and sparse DNNs with ReLU. They show that the proposed BCD algorithm converges globally to a stationary point with an R-linear convergence rate of order one and performs better than all the SGD variants in experiments. However, the motivations of using Tikhonov regularization and block coordinate descent are not very clear. The technical parts are hard to follow due to the absence of many details. The presented results are far from state-of-the-art. In this sense, I am not sure whether the proposed method can be applied to real "DNNs".


Convergent Block Coordinate Descent for Training Tikhonov Regularized Deep Neural Networks

Neural Information Processing Systems

By lifting the ReLU function into a higher dimensional space, we develop a smooth multi-convex formulation for training feed-forward deep neural networks (DNNs). This allows us to develop a block coordinate descent (BCD) training algorithm consisting of a sequence of numerically well-behaved convex optimizations. Using ideas from proximal point methods in convex analysis, we prove that this BCD algorithm will converge globally to a stationary point with R-linear convergence rate of order one. In experiments with the MNIST database, DNNs trained with this BCD algorithm consistently yielded better test-set error rates than identical DNN architectures trained via all the stochastic gradient descent (SGD) variants in the Caffe toolbox. Papers published at the Neural Information Processing Systems Conference.