Gunasekar, Suriya, Woodworth, Blake, Srebro, Nathan

We present a direct (primal only) derivation of Mirror Descent as a "partial" discretization of gradient flow on a Riemannian manifold where the metric tensor is the Hessian of the Mirror Descent potential function. We argue that this discretization is more faithful to the geometry than Natural Gradient Descent, which is obtained by a "full" forward Euler discretization. This view helps shed light on the relationship between the methods and allows generalizing Mirror Descent to any Riemannian geometry, even when the metric tensor is not a Hessian, and thus there is no "dual".

Stochastic gradient descent is a simple approach to find the local minima of a cost function whose evaluations are corrupted by noise. In this paper, we develop a procedure extending stochastic gradient descent algorithms to the case where the function is defined on a Riemannian manifold. We prove that, as in the Euclidian case, the gradient descent algorithm converges to a critical point of the cost function. The algorithm has numerous potential applications, and is illustrated here by four examples. In particular a novel gossip algorithm on the set of covariance matrices is derived and tested numerically.

We develop Riemannian Stein Variational Gradient Descent (RSVGD), a Bayesian inference method that generalizes Stein Variational Gradient Descent (SVGD) to Riemann manifold. The benefits are two-folds: (i) for inference tasks in Euclidean spaces, RSVGD has the advantage over SVGD of utilizing information geometry, and (ii) for inference tasks on Riemann manifolds, RSVGD brings the unique advantages of SVGD to the Riemannian world. To appropriately transfer to Riemann manifolds, we conceive novel and non-trivial techniques for RSVGD, which are required by the intrinsically different characteristics of general Riemann manifolds from Euclidean spaces. We also discover Riemannian Stein's Identity and Riemannian Kernelized Stein Discrepancy. Experimental results show the advantages over SVGD of exploring distribution geometry and the advantages of particle-efficiency, iteration-effectiveness and approximation flexibility over other inference methods on Riemann manifolds.

Mallasto, Anton, Haije, Tom Dela, Feragen, Aasa

In optimization, the natural gradient method is well-known for likelihood maximization. The method uses the Kullback-Leibler divergence, corresponding infinitesimally to the Fisher-Rao metric, which is pulled back to the parameter space of a family of probability distributions. This way, gradients with respect to the parameters respect the Fisher-Rao geometry of the space of distributions, which might differ vastly from the standard Euclidean geometry of the parameter space, often leading to faster convergence. However, when minimizing an arbitrary similarity measure between distributions, it is generally unclear which metric to use. We provide a general framework that, given a similarity measure, derives a metric for the natural gradient. We then discuss connections between the natural gradient method and multiple other optimization techniques in the literature. Finally, we provide computations of the formal natural gradient to show overlap with well-known cases and to compute natural gradients in novel frameworks.

Kasai, Hiroyuki, Mishra, Bamdev

We propose a novel Riemannian manifold preconditioning approach for the tensor completion problem with rank constraint. A novel Riemannian metric or inner product is proposed that exploits the least-squares structure of the cost function and takes into account the structured symmetry that exists in Tucker decomposition. The specific metric allows to use the versatile framework of Riemannian optimization on quotient manifolds to develop preconditioned nonlinear conjugate gradient and stochastic gradient descent algorithms for batch and online setups, respectively. Concrete matrix representations of various optimization-related ingredients are listed. Numerical comparisons suggest that our proposed algorithms robustly outperform state-of-the-art algorithms across different synthetic and real-world datasets.