Goto

Collaborating Authors

 rda method


Modified Regularized Dual Averaging Method for Training Sparse Convolutional Neural Networks

arXiv.org Machine Learning

We proposed a modified regularized dual averaging method for training sparse deep convolutional neural networks. The regularized dual averaging method has been proven to be effective in obtaining sparse solutions in convex optimization problems, but not applied to deep learning fields before. We analyzed the new version in convex conditions and prove the convergence of it. The modified method can obtain more sparse solutions than traditional sparse optimization methods such as proximal-SGD, while keeping almost the same accuracy as stochastic gradient method with momentum on certain datasets.


Online Classification Using a Voted RDA Method

AAAI Conferences

We propose a voted dual averaging method for on- line classification problems with explicit regularization. This method employs the update rule of the regularized dual averaging (RDA) method proposed by Xiao, but only on the subsequence of training examples where a classification error is made. We derive a bound on the number of mistakes made by this method on the training set, as well as its generalization error rate. We also intro- duce the concept of relative strength of regularization, and show how it affects the mistake bound and gener- alization performance. We examine the method using l1-regularization on a large-scale natural language pro- cessing task, and obtained state-of-the-art classification performance with fairly sparse models.


Online Classification Using a Voted RDA Method

arXiv.org Machine Learning

We propose a voted dual averaging method for online classification problems with explicit regularization. This method employs the update rule of the regularized dual averaging (RDA) method, but only on the subsequence of training examples where a classification error is made. We derive a bound on the number of mistakes made by this method on the training set, as well as its generalization error rate. We also introduce the concept of relative strength of regularization, and show how it affects the mistake bound and generalization performance. We experimented with the method using $\ell_1$ regularization on a large-scale natural language processing task, and obtained state-of-the-art classification performance with fairly sparse models.


Dual Averaging Method for Regularized Stochastic Learning and Online Optimization

Neural Information Processing Systems

We consider regularized stochastic learning and online optimization problems, where the objective function is the sum of two convex terms: one is the loss function of the learning task, and the other is a simple regularization term such as L1-norm for sparsity. We develop a new online algorithm, the regularized dual averaging method, that can explicitly exploit the regularization structure in an online setting. In particular, at each iteration, the learning variables are adjusted by solving a simple optimization problem that involves the running average of all past subgradients of the loss functions and the whole regularization term, not just its subgradient. This method achieves the optimal convergence rate and often enjoys a low complexity per iteration similar as the standard stochastic gradient method. Computational experiments are presented for the special case of sparse online learning using L1-regularization.