Novelli, Pietro
Laplace Transform Based Low-Complexity Learning of Continuous Markov Semigroups
Kostic, Vladimir R., Lounici, Karim, Halconruy, Hélène, Devergne, Timothée, Novelli, Pietro, Pontil, Massimiliano
Markov processes serve as a universal model for many real-world random processes. This paper presents a data-driven approach for learning these models through the spectral decomposition of the infinitesimal generator (IG) of the Markov semigroup. The unbounded nature of IGs complicates traditional methods such as vector-valued regression and Hilbert-Schmidt operator analysis. Existing techniques, including physics-informed kernel regression, are computationally expensive and limited in scope, with no recovery guarantees for transfer operator methods when the time-lag is small. We propose a novel method that leverages the IG's resolvent, characterized by the Laplace transform of transfer operators. This approach is robust to time-lag variations, ensuring accurate eigenvalue learning even for small time-lags. Our statistical analysis applies to a broader class of Markov processes than current methods while reducing computational complexity from quadratic to linear in the state dimension. Finally, we illustrate the behaviour of our method in two experiments.
Neural Conditional Probability for Inference
Kostic, Vladimir R., Lounici, Karim, Pacreau, Gregoire, Novelli, Pietro, Turri, Giacomo, Pontil, Massimiliano
We introduce NCP (Neural Conditional Probability), a novel operator-theoretic approach for learning conditional distributions with a particular focus on inference tasks. NCP can be used to build conditional confidence regions and extract important statistics like conditional quantiles, mean, and covariance. It offers streamlined learning through a single unconditional training phase, facilitating efficient inference without the need for retraining even when conditioning changes. By tapping into the powerful approximation capabilities of neural networks, our method efficiently handles a wide variety of complex probability distributions, effectively dealing with nonlinear relationships between input and output variables. Theoretical guarantees ensure both optimization consistency and statistical accuracy of the NCP method. Our experiments show that our approach matches or beats leading methods using a simple Multi-Layer Perceptron (MLP) with two hidden layers and GELU activations. This demonstrates that a minimalistic architecture with a theoretically grounded loss function can achieve competitive results without sacrificing performance, even in the face of more complex architectures.
Operator World Models for Reinforcement Learning
Novelli, Pietro, Pratticò, Marco, Pontil, Massimiliano, Ciliberto, Carlo
Policy Mirror Descent (PMD) is a powerful and theoretically sound methodology for sequential decision-making. However, it is not directly applicable to Reinforcement Learning (RL) due to the inaccessibility of explicit action-value functions. We address this challenge by introducing a novel approach based on learning a world model of the environment using conditional mean embeddings. We then leverage the operatorial formulation of RL to express the action-value function in terms of this quantity in closed form via matrix operations. Combining these estimators with PMD leads to POWR, a new RL algorithm for which we prove convergence rates to the global optimum. Preliminary experiments in finite and infinite state settings support the effectiveness of our method.
Transfer learning for atomistic simulations using GNNs and kernel mean embeddings
Falk, John, Bonati, Luigi, Novelli, Pietro, Parrinello, Michele, Pontil, Massimiliano
Interatomic potentials learned using machine learning methods have been successfully applied to atomistic simulations. However, accurate models require large training datasets, while generating reference calculations is computationally demanding. To bypass this difficulty, we propose a transfer learning algorithm that leverages the ability of graph neural networks (GNNs) to represent chemical environments together with kernel mean embeddings. We extract a feature map from GNNs pre-trained on the OC20 dataset and use it to learn the potential energy surface from system-specific datasets of catalytic processes. Our method is further enhanced by incorporating into the kernel the chemical species information, resulting in improved performance and interpretability. We test our approach on a series of realistic datasets of increasing complexity, showing excellent generalization and transferability performance, and improving on methods that rely on GNNs or ridge regression alone, as well as similar fine-tuning approaches.
A randomized algorithm to solve reduced rank operator regression
Turri, Giacomo, Kostic, Vladimir, Novelli, Pietro, Pontil, Massimiliano
We present and analyze an algorithm designed for addressing vector-valued regression problems involving possibly infinite-dimensional input and output spaces. The algorithm is a randomized adaptation of reduced rank regression, a technique to optimally learn a low-rank vector-valued function (i.e. an operator) between sampled data via regularized empirical risk minimization with rank constraints. We propose Gaussian sketching techniques both for the primal and dual optimization objectives, yielding Randomized Reduced Rank Regression (R4) estimators that are efficient and accurate. For each of our R4 algorithms we prove that the resulting regularized empirical risk is, in expectation w.r.t. randomness of a sketch, arbitrarily close to the optimal value when hyper-parameteres are properly tuned. Numerical expreriments illustrate the tightness of our bounds and show advantages in two distinct scenarios: (i) solving a vector-valued regression problem using synthetic and large-scale neuroscience datasets, and (ii) regressing the Koopman operator of a nonlinear stochastic dynamical system.
Consistent Long-Term Forecasting of Ergodic Dynamical Systems
Inzerilli, Prune, Kostic, Vladimir, Lounici, Karim, Novelli, Pietro, Pontil, Massimiliano
We study the evolution of distributions under the action of an ergodic dynamical system, which may be stochastic in nature. By employing tools from Koopman and transfer operator theory one can evolve any initial distribution of the state forward in time, and we investigate how estimators of these operators perform on long-term forecasting. Motivated by the observation that standard estimators may fail at this task, we introduce a learning paradigm that neatly combines classical techniques of eigenvalue deflation from operator theory and feature centering from statistics. This paradigm applies to any operator estimator based on empirical risk minimization, making them satisfy learning bounds which hold uniformly on the entire trajectory of future distributions, and abide to the conservation of mass for each of the forecasted distributions. Numerical experiments illustrates the advantages of our approach in practice.
Dynamics Harmonic Analysis of Robotic Systems: Application in Data-Driven Koopman Modelling
Ordoñez-Apraez, Daniel, Kostic, Vladimir, Turrisi, Giulio, Novelli, Pietro, Mastalli, Carlos, Semini, Claudio, Pontil, Massimiliano
We introduce the use of harmonic analysis to decompose the state space of symmetric robotic systems into orthogonal isotypic subspaces. These are lower-dimensional spaces that capture distinct, symmetric, and synergistic motions. For linear dynamics, we characterize how this decomposition leads to a subdivision of the dynamics into independent linear systems on each subspace, a property we term dynamics harmonic analysis (DHA). To exploit this property, we use Koopman operator theory to propose an equivariant deep-learning architecture that leverages the properties of DHA to learn a global linear model of system dynamics. Our architecture, validated on synthetic systems and the dynamics of locomotion of a quadrupedal robot, demonstrates enhanced generalization, sample efficiency, and interpretability, with less trainable parameters and computational costs.
Sharp Spectral Rates for Koopman Operator Learning
Kostic, Vladimir, Lounici, Karim, Novelli, Pietro, Pontil, Massimiliano
Nonlinear dynamical systems can be handily described by the associated Koopman operator, whose action evolves every observable of the system forward in time. Learning the Koopman operator and its spectral decomposition from data is enabled by a number of algorithms. In this work we present for the first time non-asymptotic learning bounds for the Koopman eigenvalues and eigenfunctions. We focus on time-reversal-invariant stochastic dynamical systems, including the important example of Langevin dynamics. We analyze two popular estimators: Extended Dynamic Mode Decomposition (EDMD) and Reduced Rank Regression (RRR). Our results critically hinge on novel {minimax} estimation bounds for the operator norm error, that may be of independent interest. Our spectral learning bounds are driven by the simultaneous control of the operator norm error and a novel metric distortion functional of the estimated eigenfunctions. The bounds indicates that both EDMD and RRR have similar variance, but EDMD suffers from a larger bias which might be detrimental to its learning rate. Our results shed new light on the emergence of spurious eigenvalues, an issue which is well known empirically. Numerical experiments illustrate the implications of the bounds in practice.
Learning invariant representations of time-homogeneous stochastic dynamical systems
Kostic, Vladimir R., Novelli, Pietro, Grazzi, Riccardo, Lounici, Karim, Pontil, Massimiliano
We consider the general class of time-homogeneous stochastic dynamical systems, both discrete and continuous, and study the problem of learning a representation of the state that faithfully captures its dynamics. This is instrumental to learn the transfer operator of the system, that in turn can be used for numerous tasks, such as forecasting and interpreting the system dynamics. We show that the search for a good representation can be cast as an optimization problem over neural networks. Our approach is supported by recent results in statistical learning theory, highlighting the role of approximation error and metric distortion in the context of transfer operator regression. The objective function we propose is associated with projection operators from the representation space to the data space, overcomes metric distortion, and can be empirically estimated from data. In the discrete time setting, we further derive a relaxed objective function that is differentiable and numerically well-conditioned. We compare our method against state-of-the-art approaches on different datasets, showing better performance across the board.
Estimating Koopman operators with sketching to provably learn large scale dynamical systems
Meanti, Giacomo, Chatalic, Antoine, Kostic, Vladimir R., Novelli, Pietro, Pontil, Massimiliano, Rosasco, Lorenzo
The theory of Koopman operators allows to deploy non-parametric machine learning algorithms to predict and analyze complex dynamical systems. Estimators such as principal component regression (PCR) or reduced rank regression (RRR) in kernel spaces can be shown to provably learn Koopman operators from finite empirical observations of the system's time evolution. Scaling these approaches to very long trajectories is a challenge and requires introducing suitable approximations to make computations feasible. In this paper, we boost the efficiency of different kernel-based Koopman operator estimators using random projections (sketching). We derive, implement and test the new "sketched" estimators with extensive experiments on synthetic and large-scale molecular dynamics datasets. Further, we establish non asymptotic error bounds giving a sharp characterization of the trade-offs between statistical learning rates and computational efficiency. Our empirical and theoretical analysis shows that the proposed estimators provide a sound and efficient way to learn large scale dynamical systems. In particular our experiments indicate that the proposed estimators retain the same accuracy of PCR or RRR, while being much faster.