to

### Zeroth-order Optimization on Riemannian Manifolds

We propose and analyze zeroth-order algorithms for optimization over Riemannian manifolds, where we observe only potentially noisy evaluations of the objective function. Our approach is based on estimating the Riemannian gradient from the objective function evaluations. We consider three settings for the objective function: (i) deterministic and smooth, (ii) stochastic and smooth, and (iii) composition of smooth and non-smooth parts. For each of the setting, we characterize the oracle complexity of our algorithm to obtain appropriately defined notions of $\epsilon$-stationary points. Notably, our complexities are independent of the ambient dimension of the Euclidean space in which the manifold is embedded in, and only depend on the intrinsic dimension of the manifold. As a proof of concept, we demonstrate the applicability of our method to the problem of black-box attacks to deep neural networks, by providing simulation and real-world image data based experimental results.

### Riemannian SVRG: Fast Stochastic Optimization on Riemannian Manifolds

We study optimization of finite sums of \emph{geodesically} smooth functions on Riemannian manifolds. Although variance reduction techniques for optimizing finite-sums have witnessed tremendous attention in the recent years, existing work is limited to vector space problems. We introduce \emph{Riemannian SVRG} (\rsvrg), a new variance reduced Riemannian optimization method. Our analysis reveals that \rsvrg inherits advantages of the usual SVRG method, but with factors depending on curvature of the manifold that influence its convergence. To our knowledge, \rsvrg is the first \emph{provably fast} stochastic Riemannian method.

### McTorch, a manifold optimization library for deep learning

In this paper, we introduce McTorch, a manifold optimization library for deep learning that extends PyTorch. It aims to lower the barrier for users wishing to use manifold constraints in deep learning applications, i.e., when the parameters are constrained to lie on a manifold. Such constraints include the popular orthogonality and rank constraints, and have been recently used in a number of applications in deep learning. McTorch follows PyTorch's architecture and decouples manifold definitions and optimizers, i.e., once a new manifold is added it can be used with any existing optimizer and vice-versa. McTorch is available at https://github.com/mctorch .

### An Alternative to EM for Gaussian Mixture Models: Batch and Stochastic Riemannian Optimization

We consider maximum likelihood estimation for Gaussian Mixture Models (Gmms). This task is almost invariably solved (in theory and practice) via the Expectation Maximization (EM) algorithm. EM owes its success to various factors, of which is its ability to fulfill positive definiteness constraints in closed form is of key importance. We propose an alternative to EM by appealing to the rich Riemannian geometry of positive definite matrices, using which we cast Gmm parameter estimation as a Riemannian optimization problem. Surprisingly, such an out-of-the-box Riemannian formulation completely fails and proves much inferior to EM. This motivates us to take a closer look at the problem geometry, and derive a better formulation that is much more amenable to Riemannian optimization. We then develop (Riemannian) batch and stochastic gradient algorithms that outperform EM, often substantially. We provide a non-asymptotic convergence analysis for our stochastic method, which is also the first (to our knowledge) such global analysis for Riemannian stochastic gradient. Numerous empirical results are included to demonstrate the effectiveness of our methods.

### A Riemannian Network for SPD Matrix Learning

Symmetric Positive Definite (SPD) matrix learning methods have become popular in many image and video processing tasks, thanks to their ability to learn appropriate statistical representations while respecting Riemannian geometry of underlying SPD manifolds. In this paper we build a Riemannian network architecture to open up a new direction of SPD matrix non-linear learning in a deep model. In particular, we devise bilinear mapping layers to transform input SPD matrices to more desirable SPD matrices, exploit eigenvalue rectification layers to apply a non-linear activation function to the new SPD matrices, and design an eigenvalue logarithm layer to perform Riemannian computing on the resulting SPD matrices for regular output layers. For training the proposed deep network, we exploit a new backpropagation with a variant of stochastic gradient descent on Stiefel manifolds to update the structured connection weights and the involved SPD matrix data. We show through experiments that the proposed SPD matrix network can be simply trained and outperform existing SPD matrix learning and state-of-the-art methods in three typical visual classification tasks.