Goto

Collaborating Authors

 covariance operator




PCA of probability measures: Sparse and Dense sampling regimes

Erell, Gachon, Bigot, Jérémie, Cazelles, Elsa

arXiv.org Machine Learning

A common approach to perform PCA on probability measures is to embed them into a Hilbert space where standard functional PCA techniques apply. While convergence rates for estimating the embedding of a single measure from $m$ samples are well understood, the literature has not addressed the setting involving multiple measures. In this paper, we study PCA in a double asymptotic regime where $n$ probability measures are observed, each through $m$ samples. We derive convergence rates of the form $n^{-1/2} + m^{-α}$ for the empirical covariance operator and the PCA excess risk, where $α>0$ depends on the chosen embedding. This characterizes the relationship between the number $n$ of measures and the number $m$ of samples per measure, revealing a sparse (small $m$) to dense (large $m$) transition in the convergence behavior. Moreover, we prove that the dense-regime rate is minimax optimal for the empirical covariance error. Our numerical experiments validate these theoretical rates and demonstrate that appropriate subsampling preserves PCA accuracy while reducing computational cost.


Statistical and Geometrical properties of the Kernel Kullback-Leibler divergence

Neural Information Processing Systems

In this paper, we study the statistical and geometrical properties of the Kullback-Leibler divergence with kernel covariance operators (KKL) introduced by [Bach, 2022, Information Theory with Kernel Methods]. Unlike the classical Kullback-Leibler (KL) divergence that involves density ratios, the KKL compares probability distributions through covariance operators (embeddings) in a reproducible kernel Hilbert space (RKHS), and compute the Kullback-Leibler quantum divergence. This novel divergence hence shares parallel but different aspects with both the standard Kullback-Leibler between probability distributions and kernel embeddings metrics such as the maximum mean discrepancy. A limitation faced with the original KKL divergence is its inability to be defined for distributions with disjoint supports. To solve this problem, we propose in this paper a regularised variant that guarantees that divergence is well defined for all distributions. We derive bounds that quantify the deviation of the regularised KKL to the original one, as well as concentration bounds. In addition, we provide a closed-form expression for the regularised KKL, specifically applicable when the distributions consist of finite sets of points, which makes it implementable. Furthermore, we derive a Wasserstein gradient descent scheme of the KKL divergence in the case of discrete distributions, and study empirically its properties to transport a set of points to a target distribution.


Random Gradient-Free Optimization in Infinite Dimensional Spaces

Peixoto, Caio Lins, Csillag, Daniel, da Costa, Bernardo F. P., Saporito, Yuri F.

arXiv.org Machine Learning

In this paper, we propose a random gradient-free method for optimization in infinite dimensional Hilbert spaces, applicable to functional optimization in diverse settings. Though such problems are often solved through finite-dimensional gradient descent over a parametrization of the functions, such as neural networks, an interesting alternative is to instead perform gradient descent directly in the function space by leveraging its Hilbert space structure, thus enabling provable guarantees and fast convergence. However, infinite-dimensional gradients are often hard to compute in practice, hindering the applicability of such methods. To overcome this limitation, our framework requires only the computation of directional derivatives and a pre-basis for the Hilbert space domain, i.e., a linearly-independent set whose span is dense in the Hilbert space. This fully resolves the tractability issue, as pre-bases are much more easily obtained than full orthonormal bases or reproducing kernels -- which may not even exist -- and individual directional derivatives can be easily computed using forward-mode scalar automatic differentiation. We showcase the use of our method to solve partial differential equations à la physics informed neural networks (PINNs), where it effectively enables provable convergence.


Toward Scalable and Valid Conditional Independence Testing with Spectral Representations

Frohlich, Alek, Kostic, Vladimir, Lounici, Karim, Perazzo, Daniel, Pontil, Massimiliano

arXiv.org Machine Learning

Conditional independence (CI) is central to causal inference, feature selection, and graphical modeling, yet it is untestable in many settings without additional assumptions. Existing CI tests often rely on restrictive structural conditions, limiting their validity on real-world data. Kernel methods using the partial covariance operator offer a more principled approach but suffer from limited adaptivity, slow convergence, and poor scalability. In this work, we explore whether representation learning can help address these limitations. Specifically, we focus on representations derived from the singular value decomposition of the partial covariance operator and use them to construct a simple test statistic, reminiscent of the Hilbert-Schmidt Independence Criterion (HSIC). We also introduce a practical bi-level contrastive algorithm to learn these representations. Our theory links representation learning error to test performance and establishes asymptotic validity and power guarantees. Preliminary experiments suggest that this approach offers a practical and statistically grounded path toward scalable CI testing, bridging kernel-based theory with modern representation learning.



Generalized infinite dimensional Alpha-Procrustes based geometries

Goomanee, Salvish, Han, Andi, Jawanpuria, Pratik, Mishra, Bamdev

arXiv.org Machine Learning

Symmetric positive definite (SPD) matrices and operators are central to a wide range of problems in data science, including covariance estimation, kernel methods, diffusion geometry, and generative modeling. While the geometry of SPD matrices has been extensively studied in the finite-dimensional settingwith popular metrics such as the affine-invariant, Log-Euclidean, and Bures-Wasserstein (BW) distances, many real-world applications inherently involve infinite-dimensional SPD operators. These include covariance operators on functional spaces, integral kernels, and diffusion operators on manifolds. However, most existing geometric frameworks do not generalize coherently across finite and infinite dimensions, leading to inconsistencies in modeling, analysis, and computation. To address this, we propose a unifying family of Riemannian distances based on generalized alpha-Procrustes distances. This family includes the Log-Hilbert-Schmidt and infinite-dimensional GBW metrics as special cases and enables a continuous interpolation between them. Crucially, it is designed to extend smoothly from finite-dimensional SPD matrices to infinite-dimensional positive-definite Hilbert-Schmidt operators, offering a robust and flexible geometric foundation for both theoretical analysis and practical machine learning applications.