Goto

Collaborating Authors

 hessf


Riemannian Hamiltonian methods for min-max optimization on manifolds

arXiv.org Artificial Intelligence

In this paper, we study min-max optimization problems on Riemannian manifolds. We introduce a Riemannian Hamiltonian function, minimization of which serves as a proxy for solving the original min-max problems. Under the Riemannian Polyak--{\L}ojasiewicz condition on the Hamiltonian function, its minimizer corresponds to the desired min-max saddle point. We also provide cases where this condition is satisfied. For geodesic-bilinear optimization in particular, solving the proxy problem leads to the correct search direction towards global optimality, which becomes challenging with the min-max formulation. To minimize the Hamiltonian function, we propose Riemannian Hamiltonian methods (RHM) and present their convergence analyses. We extend RHM to include consensus regularization and to the stochastic setting. We illustrate the efficacy of the proposed RHM in applications such as subspace robust Wasserstein distance, robust training of neural networks, and generative adversarial networks.


Riemannian optimization with a preconditioning scheme on the generalized Stiefel manifold

arXiv.org Artificial Intelligence

Optimization problems on the generalized Stiefel manifold (and products of it) are prevalent across science and engineering. For example, in computational science they arise in symmetric (generalized) eigenvalue problems, in nonlinear eigenvalue problems, and in electronic structures computations, to name a few problems. In statistics and machine learning, they arise, for example, in various dimensionality reduction techniques such as canonical correlation analysis. In deep learning, regularization and improved stability can be obtained by constraining some layers to have parameter matrices that belong to the Stiefel manifold. Solving problems on the generalized Stiefel manifold can be approached via the tools of Riemannian optimization. However, using the standard geometric components for the generalized Stiefel manifold has two possible shortcomings: computing some of the geometric components can be too expensive and convergence can be rather slow in certain cases. Both shortcomings can be addressed using a technique called Riemannian preconditioning, which amounts to using geometric components derived by a precoditioner that defines a Riemannian metric on the constraint manifold. In this paper we develop the geometric components required to perform Riemannian optimization on the generalized Stiefel manifold equipped with a non-standard metric, and illustrate theoretically and numerically the use of those components and the effect of Riemannian preconditioning for solving optimization problems on the generalized Stiefel manifold.


Riemannian accelerated gradient methods via extrapolation

arXiv.org Artificial Intelligence

Optimization on a Riemannian manifold naturally appears in various fields of applications, including principal component analysis [22, 61], matrix completion and factorization [35, 56, 13], dictionary learning [17, 27], optimal transport [49, 40, 26], to name a few. Riemannian optimization [2, 12] provides a universal and efficient framework for problem (1) that respects the intrinsic geometry of the constraint set. In addition, many non-convex problems turns out to be geodesic convex (a generalized notion of convexity) on the manifold, which yields better convergence guarantees for Riemannian optimization methods. One of the most fundamental solvers is the Riemannian gradient descent method [55, 62, 2, 12], which generalizes the classical gradient descent method in the Euclidean space with intrinsic updates on manifolds. There also exist various advanced algorithms for Riemannian optimization that include stochastic and variance reduced methods [11, 61, 34, 24, 25], adaptive gradient methods [8, 33] quasi-Newton methods [30, 43], trust region methods [1], and cubic regularized Newton methods [3], among others. Nevertheless, it remains unclear whether there exists a simple strategy to accelerate firstorder algorithms on Riemannian manifolds. Existing research on accelerated gradient methods focus primarily on generalizing Nesterov acceleration [42] to Riemannian manifolds, including [37, 4, 63, 6, 31, 36]. However, most of the algorithms are theoretic constructs and are usually less favourable in practice.


Towards Sharp Stochastic Zeroth Order Hessian Estimators over Riemannian Manifolds

arXiv.org Machine Learning

We study Hessian estimators for real-valued functions defined over an $n$-dimensional complete Riemannian manifold. We introduce new stochastic zeroth-order Hessian estimators using $O (1)$ function evaluations. We show that, for a smooth real-valued function $f$ with Lipschitz Hessian (with respect to the Rimannian metric), our estimator achieves a bias bound of order $ O \left( L_2 \delta + \gamma \delta^2 \right) $, where $ L_2 $ is the Lipschitz constant for the Hessian, $ \gamma $ depends on both the Levi-Civita connection and function $f$, and $\delta$ is the finite difference step size. To the best of our knowledge, our results provide the first bias bound for Hessian estimators that explicitly depends on the geometry of the underlying Riemannian manifold. Perhaps more importantly, our bias bound does not increase with dimension $n$. This improves best previously known bias bound for $O(1)$-evaluation Hessian estimators, which increases quadratically with $n$. We also study downstream computations based on our Hessian estimators. The supremacy of our method is evidenced by empirical evaluations.


Convergence rates for the stochastic gradient descent method for non-convex objective functions

arXiv.org Machine Learning

We prove the local convergence to minima and estimates on the rate of convergence for the stochastic gradient descent method in the case of not necessarily globally convex nor contracting objective functions. In particular, the results are applicable to simple objective functions arising in machine learning.


Solving SDPs for synchronization and MaxCut problems via the Grothendieck inequality

arXiv.org Machine Learning

A number of statistical estimation problems can be addressed by semidefinite programs (SDP). While SDPs are solvable in polynomial time using interior point methods, in practice generic SDP solvers do not scale well to high-dimensional problems. In order to cope with this problem, Burer and Monteiro proposed a non-convex rank-constrained formulation, which has good performance in practice but is still poorly understood theoretically. In this paper we study the rank-constrained version of SDPs arising in MaxCut and in synchronization problems. We establish a Grothendieck-type inequality that proves that all the local maxima and dangerous saddle points are within a small multiplicative gap from the global maximum. We use this structural information to prove that SDPs can be solved within a known accuracy, by applying the Riemannian trust-region method to this non-convex problem, while constraining the rank to be of order one. For the MaxCut problem, our inequality implies that any local maximizer of the rank-constrained SDP provides a $ (1 - 1/(k-1)) \times 0.878$ approximation of the MaxCut, when the rank is fixed to $k$. We then apply our results to data matrices generated according to the Gaussian ${\mathbb Z}_2$ synchronization problem, and the two-groups stochastic block model with large bounded degree. We prove that the error achieved by local maximizers undergoes a phase transition at the same threshold as for information-theoretically optimal methods.


Riemannian Submanifold Tracking on Low-Rank Algebraic Variety

AAAI Conferences

Matrix recovery aims to learn a low-rank structure from high dimensional data, which arises in numerous learning applications. As a popular heuristic to matrix recovery, convex relaxation involves iterative calling of singular value decomposition (SVD). Riemannian optimization based method can alleviate such expensive cost in SVD for improved scalability, which however is usually degraded by the unknown rank. This paper proposes a novel algorithm RIST that exploits the algebraic variety of low-rank manifold for matrix recovery. Particularly, RIST utilizes an efficient scheme that automatically estimate the potential rank on the real algebraic variety and tracks the favorable Riemannian submanifold. Moreover, RIST utilizes the second-order geometric characterization and achieves provable superlinear convergence, which is superior to the linear convergence of most existing methods. Extensive comparison experiments demonstrate the accuracy and ef- ficiency of RIST algorithm.