Goto

Collaborating Authors

 convergence analysis


Coordinate-wise Power Method

Neural Information Processing Systems

In this paper, we propose a coordinate-wise version of the power method from an optimization viewpoint. The vanilla power method simultaneously updates all the coordinates of the iterate, which is essential for its convergence analysis. However, different coordinates converge to the optimal value at different speeds. Our proposed algorithm, which we call coordinate-wise power method, is able to select and update the most important k coordinates in O(kn) time at each iteration, where n is the dimension of the matrix and k <= n is the size of the active set. Inspired by the ''greedy'' nature of our method, we further propose a greedy coordinate descent algorithm applied on a non-convex objective function specialized for symmetric matrices. We provide convergence analyses for both methods. Experimental results on both synthetic and real data show that our methods achieve up to 20 times speedup over the basic power method. Meanwhile, due to their coordinate-wise nature, our methods are very suitable for the important case when data cannot fit into memory. Finally, we introduce how the coordinate-wise mechanism could be applied to other iterative methods that are used in machine learning.





Supplementaryfor: MomentumCenteringand Asynchronous Update for Adaptive Gradient Methods Contents

Neural Information Processing Systems

There exists an online convex optimization problem where Adam (and RMSprop) has non-zero average regret, and one of the problem is in the form ft(x)= ( Px, if t mod P =1 x, Otherwise x [ 1,1], P N,P 3 (1) Proof. See [1] Thm.1 for proof. For the problem defined above, there's a threshold of β2 above which RMSprop converge. For the problem defined by Eq. (1), ACProp algorithm converges β1,β2 (0,1), P N,P 3. Proof. We analyze the limit behavior of ACProp algorithm.



c164bbc9d6c72a52c599bbb43d8db8e1-Paper.pdf

Neural Information Processing Systems

Deep neural networks have achieved impressive performance in many areas. Designing a fast and provable method for training neural networks is a fundamental question in machine learning. The classical training method requires paying Ω(mnd) cost for both forward computation and backward computation, where m is the width of the neural network, and we are given n training points in d-dimensional space.