gradf
Acceleration via silver stepsize on Riemannian manifolds with applications to Wasserstein space
There is extensive literature on accelerating first-order optimization methods in an Euclidean setting. Under which conditions such acceleration is feasible in Riemannian optimization problems is an active area of research. Motivated by the recent success of silver stepsize methods in the Euclidean setting, we undertake a study of such algorithms in the Riemannian setting. We provide the new class of algorithms determined by the choice of vector transport that allows the silver stepsize acceleration on Riemannian manifolds for the function classes associated with the corresponding vector transport. As a core application, we show that our algorithm recovers the standard Wasserstein gradient descent on the 2-Wasserstein space and, as a result, provides the first provable accelerated gradient method for potential functional optimization problems in the Wasserstein space.
1680e9fa7b4dd5d62ece800239bb53bd-Supplemental.pdf
We analyze here briefly some basic notions of the geometry of the sphere that we use in our algorithm and convergence analysis. We refer the reader to [1, p. 73-76] for a more comprehensive presentation. Tangent Space: The tangent space of the r-dimensional sphere Sr at a point p is an r-dimensional vector space, which generalizes the notion of tangent plane in two dimensions. We denote it by TpSr and a vector v belongs in it, if and only if, it can be written as ฮฑ(0), where ฮฑ: ( ฮต,ฮต) Sr (for some ฮต > 0) is a smooth curve with ฮฑ(0) = p. The tangent space at pcan be given also in an explicit way, as the set of all vectors in Rr+1 orthogonal to p with respect to the usual inner product.
Minimization of Functions on Dually Flat Spaces Using Geodesic Descent Based on Dual Connections
We propose geodesic-based optimization methods on dually flat spaces, where the geometric structure of the parameter manifold is closely related to the form of the objective function. A primary application is maximum likelihood estimation in statistical models, especially exponential families, whose model manifolds are dually flat. We show that an m-geodesic update, which directly optimizes the log-likelihood, can theoretically reach the maximum likelihood estimator in a single step. In contrast, an e-geodesic update has a practical advantage in cases where the parameter space is geodesically complete, allowing optimization without explicitly handling parameter constraints. We establish the theoretical properties of the proposed methods and validate their effectiveness through numerical experiments.
Dual Riemannian Newton Method on Statistical Manifolds
Zhou, Derun, Yano, Keisuke, Sugiyama, Mahito
In probabilistic modeling, parameter estimation is commonly formulated as a minimization problem on a parameter manifold. Optimization in such spaces requires geometry-aware methods that respect the underlying information structure. While the natural gradient leverages the Fisher information metric as a form of Riemannian gradient descent, it remains a first-order method and often exhibits slow convergence near optimal solutions. Existing second-order manifold algorithms typically rely on the Levi-Civita connection, thus overlooking the dual-connection structure that is central to information geometry. We propose the dual Riemannian Newton method, a Newton-type optimization algorithm on manifolds endowed with a metric and a pair of dual affine connections. The dual Riemannian Newton method explicates how duality shapes second-order updates: when the retraction (a local surrogate of the exponential map) is defined by one connection, the associated Newton equation is posed with its dual. We establish local quadratic convergence and validate the theory with experiments on representative statistical models. Thus, the dual Riemannian Newton method thus delivers second-order efficiency while remaining compatible with the dual structures that underlie modern information-geometric learning and inference.
Finite-Time Analysis of Stochastic Nonconvex Nonsmooth Optimization on the Riemannian Manifolds
Sahinoglu, Emre, Sun, Youbang, Shahrampour, Shahin
This work addresses the finite-time analysis of nonsmooth nonconvex stochastic optimization under Riemannian manifold constraints. We adapt the notion of Goldstein stationarity to the Riemannian setting as a performance metric for nonsmooth optimization on manifolds. We then propose a Riemannian Online to NonConvex (RO2NC) algorithm, for which we establish the sample complexity of $O(ฮต^{-3}ฮด^{-1})$ in finding $(ฮด,ฮต)$-stationary points. This result is the first-ever finite-time guarantee for fully nonsmooth, nonconvex optimization on manifolds and matches the optimal complexity in the Euclidean setting. When gradient information is unavailable, we develop a zeroth order version of RO2NC algorithm (ZO-RO2NC), for which we establish the same sample complexity. The numerical results support the theory and demonstrate the practical effectiveness of the algorithms.
Federated Learning on Riemannian Manifolds: A Gradient-Free Projection-Based Approach
Wang, Hongye, Pan, Zhaoye, He, Chang, Li, Jiaxiang, Jiang, Bo
Federated learning (FL) has emerged as a powerful paradigm for collaborative model training across distributed clients while preserving data privacy. However, existing FL algorithms predominantly focus on unconstrained optimization problems with exact gradient information, limiting its applicability in scenarios where only noisy function evaluations are accessible or where model parameters are constrained. To address these challenges, we propose a novel zeroth-order projection-based algorithm on Riemannian manifolds for FL. By leveraging the projection operator, we introduce a computationally efficient zeroth-order Riemannian gradient estimator. Unlike existing estimators, ours requires only a simple Euclidean random perturbation, eliminating the need to sample random vectors in the tangent space, thus reducing computational cost. Theoretically, we first prove the approximation properties of the estimator and then establish the sublinear convergence of the proposed algorithm, matching the rate of its first-order counterpart. Numerically, we first assess the efficiency of our estimator using kernel principal component analysis. Furthermore, we apply the proposed algorithm to two real-world scenarios: zeroth-order attacks on deep neural networks and low-rank neural network training to validate the theoretical findings.