Lounici, Karim
CoHiRF: A Scalable and Interpretable Clustering Framework for High-Dimensional Data
Belucci, Bruno, Lounici, Karim, Meziani, Katia
Clustering high-dimensional data poses significant High-dimensional datasets suffer from the well-known challenges due to the curse of dimensionality, "curse of dimensionality." As the dimensionality p increases, scalability issues, and the presence of noisy and the relevant information often lies in a low-dimensional subspace, irrelevant features. We propose Consensus Hierarchical with the remaining dimensions contributing predominantly Random Feature (CoHiRF), a novel to noise. Consequently, data points tend to become clustering method designed to address these challenges equidistant in high-dimensional space, rendering traditional effectively. CoHiRF leverages random feature distance-based clustering algorithms, such as K-Means, less selection to mitigate noise and dimensionality effective (Beyer et al., 1999). Specifically, the Euclidean distance effects, repeatedly applies K-Means clustering metric loses its discriminative power, resulting in poor in reduced feature spaces, and combines results clustering performance. Another critical challenge is scalability: through a unanimous consensus criterion. This traditional clustering methods, originally designed iterative approach constructs a cluster assignment for low-dimensional or small datasets, often struggle with matrix, where each row records the cluster assignments high computational and memory demands when applied of a sample across repetitions, enabling the to high-dimensional data settings (Steinbach et al., 2004; identification of stable clusters by comparing identical Assent, 2012; Zimek et al., 2012; Mahdi et al., 2021).
Laplace Transform Based Low-Complexity Learning of Continuous Markov Semigroups
Kostic, Vladimir R., Lounici, Karim, Halconruy, Hélène, Devergne, Timothée, Novelli, Pietro, Pontil, Massimiliano
Markov processes serve as a universal model for many real-world random processes. This paper presents a data-driven approach for learning these models through the spectral decomposition of the infinitesimal generator (IG) of the Markov semigroup. The unbounded nature of IGs complicates traditional methods such as vector-valued regression and Hilbert-Schmidt operator analysis. Existing techniques, including physics-informed kernel regression, are computationally expensive and limited in scope, with no recovery guarantees for transfer operator methods when the time-lag is small. We propose a novel method that leverages the IG's resolvent, characterized by the Laplace transform of transfer operators. This approach is robust to time-lag variations, ensuring accurate eigenvalue learning even for small time-lags. Our statistical analysis applies to a broader class of Markov processes than current methods while reducing computational complexity from quadratic to linear in the state dimension. Finally, we illustrate the behaviour of our method in two experiments.
Neural Conditional Probability for Inference
Kostic, Vladimir R., Lounici, Karim, Pacreau, Gregoire, Novelli, Pietro, Turri, Giacomo, Pontil, Massimiliano
We introduce NCP (Neural Conditional Probability), a novel operator-theoretic approach for learning conditional distributions with a particular focus on inference tasks. NCP can be used to build conditional confidence regions and extract important statistics like conditional quantiles, mean, and covariance. It offers streamlined learning through a single unconditional training phase, facilitating efficient inference without the need for retraining even when conditioning changes. By tapping into the powerful approximation capabilities of neural networks, our method efficiently handles a wide variety of complex probability distributions, effectively dealing with nonlinear relationships between input and output variables. Theoretical guarantees ensure both optimization consistency and statistical accuracy of the NCP method. Our experiments show that our approach matches or beats leading methods using a simple Multi-Layer Perceptron (MLP) with two hidden layers and GELU activations. This demonstrates that a minimalistic architecture with a theoretically grounded loss function can achieve competitive results without sacrificing performance, even in the face of more complex architectures.
Contextual Continuum Bandits: Static Versus Dynamic Regret
Akhavan, Arya, Lounici, Karim, Pontil, Massimiliano, Tsybakov, Alexandre B.
We study the contextual continuum bandits problem, where the learner sequentially receives a side information vector and has to choose an action in a convex set, minimizing a function associated to the context. The goal is to minimize all the underlying functions for the received contexts, leading to a dynamic (contextual) notion of regret, which is stronger than the standard static regret. Assuming that the objective functions are H\"older with respect to the contexts, we demonstrate that any algorithm achieving a sub-linear static regret can be extended to achieve a sub-linear dynamic regret. We further study the case of strongly convex and smooth functions when the observations are noisy. Inspired by the interior point method and employing self-concordant barriers, we propose an algorithm achieving a sub-linear dynamic regret. Lastly, we present a minimax lower bound, implying two key facts. First, no algorithm can achieve sub-linear dynamic regret over functions that are not continuous with respect to the context. Second, for strongly convex and smooth functions, the algorithm that we propose achieves, up to a logarithmic factor, the minimax optimal rate of dynamic regret as a function of the number of queries.
Learning the Infinitesimal Generator of Stochastic Diffusion Processes
Kostic, Vladimir R., Lounici, Karim, Halconruy, Helene, Devergne, Timothee, Pontil, Massimiliano
We address data-driven learning of the infinitesimal generator of stochastic diffusion processes, essential for understanding numerical simulations of natural and physical systems. The unbounded nature of the generator poses significant challenges, rendering conventional analysis techniques for Hilbert-Schmidt operators ineffective. To overcome this, we introduce a novel framework based on the energy functional for these stochastic processes. Our approach integrates physical priors through an energy-based risk metric in both full and partial knowledge settings. We evaluate the statistical performance of a reduced-rank estimator in reproducing kernel Hilbert spaces (RKHS) in the partial knowledge setting. Notably, our approach provides learning bounds independent of the state space dimension and ensures non-spurious spectral estimation. Additionally, we elucidate how the distortion between the intrinsic energy-induced metric of the stochastic diffusion and the RKHS metric used for generator estimation impacts the spectral learning bounds.
Consistent Long-Term Forecasting of Ergodic Dynamical Systems
Inzerilli, Prune, Kostic, Vladimir, Lounici, Karim, Novelli, Pietro, Pontil, Massimiliano
We study the evolution of distributions under the action of an ergodic dynamical system, which may be stochastic in nature. By employing tools from Koopman and transfer operator theory one can evolve any initial distribution of the state forward in time, and we investigate how estimators of these operators perform on long-term forecasting. Motivated by the observation that standard estimators may fail at this task, we introduce a learning paradigm that neatly combines classical techniques of eigenvalue deflation from operator theory and feature centering from statistics. This paradigm applies to any operator estimator based on empirical risk minimization, making them satisfy learning bounds which hold uniformly on the entire trajectory of future distributions, and abide to the conservation of mass for each of the forecasted distributions. Numerical experiments illustrates the advantages of our approach in practice.
Sharp Spectral Rates for Koopman Operator Learning
Kostic, Vladimir, Lounici, Karim, Novelli, Pietro, Pontil, Massimiliano
Nonlinear dynamical systems can be handily described by the associated Koopman operator, whose action evolves every observable of the system forward in time. Learning the Koopman operator and its spectral decomposition from data is enabled by a number of algorithms. In this work we present for the first time non-asymptotic learning bounds for the Koopman eigenvalues and eigenfunctions. We focus on time-reversal-invariant stochastic dynamical systems, including the important example of Langevin dynamics. We analyze two popular estimators: Extended Dynamic Mode Decomposition (EDMD) and Reduced Rank Regression (RRR). Our results critically hinge on novel {minimax} estimation bounds for the operator norm error, that may be of independent interest. Our spectral learning bounds are driven by the simultaneous control of the operator norm error and a novel metric distortion functional of the estimated eigenfunctions. The bounds indicates that both EDMD and RRR have similar variance, but EDMD suffers from a larger bias which might be detrimental to its learning rate. Our results shed new light on the emergence of spurious eigenvalues, an issue which is well known empirically. Numerical experiments illustrate the implications of the bounds in practice.
Learning invariant representations of time-homogeneous stochastic dynamical systems
Kostic, Vladimir R., Novelli, Pietro, Grazzi, Riccardo, Lounici, Karim, Pontil, Massimiliano
We consider the general class of time-homogeneous stochastic dynamical systems, both discrete and continuous, and study the problem of learning a representation of the state that faithfully captures its dynamics. This is instrumental to learn the transfer operator of the system, that in turn can be used for numerous tasks, such as forecasting and interpreting the system dynamics. We show that the search for a good representation can be cast as an optimization problem over neural networks. Our approach is supported by recent results in statistical learning theory, highlighting the role of approximation error and metric distortion in the context of transfer operator regression. The objective function we propose is associated with projection operators from the representation space to the data space, overcomes metric distortion, and can be empirically estimated from data. In the discrete time setting, we further derive a relaxed objective function that is differentiable and numerically well-conditioned. We compare our method against state-of-the-art approaches on different datasets, showing better performance across the board.
Multi-task Representation Learning with Stochastic Linear Bandits
Cella, Leonardo, Lounici, Karim, Pacreau, Grégoire, Pontil, Massimiliano
We study the problem of transfer-learning in the setting of stochastic linear bandit tasks. We consider that a low dimensional linear representation is shared across the tasks, and study the benefit of learning this representation in the multi-task learning setting. Following recent results to design stochastic bandit policies, we propose an efficient greedy policy based on trace norm regularization. It implicitly learns a low dimensional representation by encouraging the matrix formed by the task regression vectors to be of low rank. Unlike previous work in the literature, our policy does not need to know the rank of the underlying matrix. We derive an upper bound on the multi-task regret of our policy, which is, up to logarithmic factors, of order $\sqrt{NdT(T+d)r}$, where $T$ is the number of tasks, $r$ the rank, $d$ the number of variables and $N$ the number of rounds per task. We show the benefit of our strategy compared to the baseline $Td\sqrt{N}$ obtained by solving each task independently. We also provide a lower bound to the multi-task regret. Finally, we corroborate our theoretical findings with preliminary experiments on synthetic data.
Meta Representation Learning with Contextual Linear Bandits
Cella, Leonardo, Lounici, Karim, Pontil, Massimiliano
Meta-learning seeks to build algorithms that rapidly learn how to solve new learning problems based on previous experience. In this paper we investigate meta-learning in the setting of stochastic linear bandit tasks. We assume that the tasks share a low dimensional representation, which has been partially acquired from previous learning tasks. We aim to leverage this information in order to learn a new downstream bandit task, which shares the same representation. Our principal contribution is to show that if the learned representation estimates well the unknown one, then the downstream task can be efficiently learned by a greedy policy that we propose in this work. We derive an upper bound on the regret of this policy, which is, up to logarithmic factors, of order $r\sqrt{N}(1\vee \sqrt{d/T})$, where $N$ is the horizon of the downstream task, $T$ is the number of training tasks, $d$ the ambient dimension and $r \ll d$ the dimension of the representation. We highlight that our strategy does not need to know $r$. We note that if $T> d$ our bound achieves the same rate of optimal minimax bandit algorithms using the true underlying representation. Our analysis is inspired and builds in part upon previous work on meta-learning in the i.i.d. full information setting \citep{tripuraneni2021provable,boursier2022trace}. As a separate contribution we show how to relax certain assumptions in those works, thereby improving their representation learning and risk analysis.