Drumetz, Lucas
Augmented Invertible Koopman Autoencoder for long-term time series forecasting
Frion, Anthony, Drumetz, Lucas, Mura, Mauro Dalla, Tochon, Guillaume, Aïssa-El-Bey, Abdeldjalil
Following the introduction of Dynamic Mode Decomposition and its numerous extensions, many neural autoencoder-based implementations of the Koopman operator have recently been proposed. This class of methods appears to be of interest for modeling dynamical systems, either through direct long-term prediction of the evolution of the state or as a powerful embedding for downstream methods. In particular, a recent line of work has developed invertible Koopman autoencoders (IKAEs), which provide an exact reconstruction of the input state thanks to their analytically invertible encoder, based on coupling layer normalizing flow models. We identify that the conservation of the dimension imposed by the normalizing flows is a limitation for the IKAE models, and thus we propose to augment the latent state with a second, non-invertible encoder network. This results in our new model: the Augmented Invertible Koopman AutoEncoder (AIKAE). We demonstrate the relevance of the AIKAE through a series of long-term time series forecasting experiments, on satellite image time series as well as on a benchmark involving predictions based on a large lookback window of observations.
FlowKac: An Efficient Neural Fokker-Planck solver using Temporal Normalizing flows and the Feynman Kac-Formula
Bekri, Naoufal El, Drumetz, Lucas, Vermet, Franck
Solving the Fokker-Planck equation for high-dimensional complex dynamical systems remains a pivotal yet challenging task due to the intractability of analytical solutions and the limitations of traditional numerical methods. In this work, we present FlowKac, a novel approach that reformulates the Fokker-Planck equation using the Feynman-Kac formula, allowing to query the solution at a given point via the expected values of stochastic paths. A key innovation of FlowKac lies in its adaptive stochastic sampling scheme which significantly reduces the computational complexity while maintaining high accuracy. This sampling technique, coupled with a time-indexed normalizing flow, designed for capturing time-evolving probability densities, enables robust sampling of collocation points, resulting in a flexible and mesh-free solver. This formulation mitigates the curse of dimensionality and enhances computational efficiency and accuracy, which is particularly crucial for applications that inherently require dimensions beyond the conventional three. We validate the robustness and scalability of our method through various experiments on a range of stochastic differential equations, demonstrating significant improvements over existing techniques.
Land Surface Temperature Super-Resolution with a Scale-Invariance-Free Neural Approach: Application to MODIS
Ait-Bachir, Romuald, Granero-Belinchon, Carlos, Michel, Aurélie, Michel, Julien, Briottet, Xavier, Drumetz, Lucas
Due to the trade-off between the temporal and spatial resolution of thermal spaceborne sensors, super-resolution methods have been developed to provide fine-scale Land SurfaceTemperature (LST) maps. Most of them are trained at low resolution but applied at fine resolution, and so they require a scale-invariance hypothesis that is not always adapted. Themain contribution of this work is the introduction of a Scale-Invariance-Free approach for training Neural Network (NN) models, and the implementation of two NN models, calledScale-Invariance-Free Convolutional Neural Network for Super-Resolution (SIF-CNN-SR) for the super-resolution of MODIS LST products. The Scale-Invariance-Free approach consists ontraining the models in order to provide LST maps at high spatial resolution that recover the initial LST when they are degraded at low resolution and that contain fine-scale texturesinformed by the high resolution NDVI. The second contribution of this work is the release of a test database with ASTER LST images concomitant with MODIS ones that can be usedfor evaluation of super-resolution algorithms. We compare the two proposed models, SIF-CNN-SR1 and SIF-CNN-SR2, with four state-of-the-art methods, Bicubic, DMS, ATPRK, Tsharp,and a CNN sharing the same architecture as SIF-CNN-SR but trained under the scale-invariance hypothesis. We show that SIF-CNN-SR1 outperforms the state-of-the-art methods and the other two CNN models as evaluated with LPIPS and Fourier space metrics focusing on the analysis of textures. These results and the available ASTER-MODIS database for evaluation are promising for future studies on super-resolution of LST.
Koopman Ensembles for Probabilistic Time Series Forecasting
Frion, Anthony, Drumetz, Lucas, Tochon, Guillaume, Mura, Mauro Dalla, Bey, Albdeldjalil Aïssa El
In the context of an increasing popularity of data-driven models to represent dynamical systems, many machine learning-based implementations of the Koopman operator have recently been proposed. However, the vast majority of those works are limited to deterministic predictions, while the knowledge of uncertainty is critical in fields like meteorology and climatology. In this work, we investigate the training of ensembles of models to produce stochastic outputs. We show through experiments on real remote sensing image time series that ensembles of independently trained models are highly overconfident and that using a training criterion that explicitly encourages the members to produce predictions with high inter-model variances greatly improves the uncertainty quantification of the ensembles.
Sliced-Wasserstein Distances and Flows on Cartan-Hadamard Manifolds
Bonet, Clément, Drumetz, Lucas, Courty, Nicolas
While many Machine Learning methods were developed or transposed on Riemannian manifolds to tackle data with known non Euclidean geometry, Optimal Transport (OT) methods on such spaces have not received much attention. The main OT tool on these spaces is the Wasserstein distance which suffers from a heavy computational burden. On Euclidean spaces, a popular alternative is the Sliced-Wasserstein distance, which leverages a closed-form solution of the Wasserstein distance in one dimension, but which is not readily available on manifolds. In this work, we derive general constructions of Sliced-Wasserstein distances on Cartan-Hadamard manifolds, Riemannian manifolds with non-positive curvature, which include among others Hyperbolic spaces or the space of Symmetric Positive Definite matrices. Then, we propose different applications. Additionally, we derive non-parametric schemes to minimize these new distances by approximating their Wasserstein gradient flows.
On Transfer in Classification: How Well do Subsets of Classes Generalize?
Baena, Raphael, Drumetz, Lucas, Gripon, Vincent
In classification, it is usual to observe that models trained on a given set of classes can generalize to previously unseen ones, suggesting the ability to learn beyond the initial task. This ability is often leveraged in the context of transfer learning where a pretrained model can be used to process new classes, with or without fine tuning. Surprisingly, there are a few papers looking at the theoretical roots beyond this phenomenon. In this work, we are interested in laying the foundations of such a theoretical framework for transferability between sets of classes. Namely, we establish a partially ordered set of subsets of classes. This tool allows to represent which subset of classes can generalize to others. In a more practical setting, we explore the ability of our framework to predict which subset of classes can lead to the best performance when testing on all of them. We also explore few-shot learning, where transfer is the golden standard. Our work contributes to better understanding of transfer mechanics and model generalization.
Time-changed normalizing flows for accurate SDE modeling
Bekri, Naoufal El, Drumetz, Lucas, Vermet, Franck
The generative paradigm has become increasingly important in machine learning and deep learning models. Among popular generative models are normalizing flows, which enable exact likelihood estimation by transforming a base distribution through diffeomorphic transformations. Extending the normalizing flow framework to handle time-indexed flows gave dynamic normalizing flows, a powerful tool to model time series, stochastic processes, and neural stochastic differential equations (SDEs). In this work, we propose a novel variant of dynamic normalizing flows, a Time Changed Normalizing Flow (TCNF), based on time deformation of a Brownian motion which constitutes a versatile and extensive family of Gaussian processes. This approach enables us to effectively model some SDEs, that cannot be modeled otherwise, including standard ones such as the well-known Ornstein-Uhlenbeck process, and generalizes prior methodologies, leading to improved results and better inference and prediction capability.
Neural Koopman prior for data assimilation
Frion, Anthony, Drumetz, Lucas, Mura, Mauro Dalla, Tochon, Guillaume, Bey, Abdeldjalil Aïssa El
With the increasing availability of large scale datasets, computational power and tools like automatic differentiation and expressive neural network architectures, sequential data are now often treated in a data-driven way, with a dynamical model trained from the observation data. While neural networks are often seen as uninterpretable black-box architectures, they can still benefit from physical priors on the data and from mathematical knowledge. In this paper, we use a neural network architecture which leverages the long-known Koopman operator theory to embed dynamical systems in latent spaces where their dynamics can be described linearly, enabling a number of appealing features. We introduce methods that enable to train such a model for long-term continuous reconstruction, even in difficult contexts where the data comes in irregularly-sampled time series. The potential for self-supervised learning is also demonstrated, as we show the promising use of trained dynamical models as priors for variational data assimilation techniques, with applications to e.g. time series interpolation and forecasting.
Hyperbolic Sliced-Wasserstein via Geodesic and Horospherical Projections
Bonet, Clément, Chapel, Laetitia, Drumetz, Lucas, Courty, Nicolas
It has been shown beneficial for many types of data which present an underlying hierarchical structure to be embedded in hyperbolic spaces. Consequently, many tools of machine learning were extended to such spaces, but only few discrepancies to compare probability distributions defined over those spaces exist. Among the possible candidates, optimal transport distances are well defined on such Riemannian manifolds and enjoy strong theoretical properties, but suffer from high computational cost. On Euclidean spaces, sliced-Wasserstein distances, which leverage a closed-form of the Wasserstein distance in one dimension, are more computationally efficient, but are not readily available on hyperbolic spaces. In this work, we propose to derive novel hyperbolic sliced-Wasserstein discrepancies. These constructions use projections on the underlying geodesics either along horospheres or geodesics. We study and compare them on different tasks where hyperbolic representations are relevant, such as sampling or image classification.
Sliced-Wasserstein on Symmetric Positive Definite Matrices for M/EEG Signals
Bonet, Clément, Malézieux, Benoît, Rakotomamonjy, Alain, Drumetz, Lucas, Moreau, Thomas, Kowalski, Matthieu, Courty, Nicolas
When dealing with electro or magnetoencephalography records, many supervised prediction tasks are solved by working with covariance matrices to summarize the signals. Learning with these matrices requires using Riemanian geometry to account for their structure. In this paper, we propose a new method to deal with distributions of covariance matrices and demonstrate its computational efficiency on M/EEG multivariate time series. More specifically, we define a Sliced-Wasserstein distance between measures of symmetric positive definite matrices that comes with strong theoretical guarantees. Then, we take advantage of its properties and kernel methods to apply this distance to brain-age prediction from MEG data and compare it to state-of-the-art algorithms based on Riemannian geometry. Finally, we show that it is an efficient surrogate to the Wasserstein distance in domain adaptation for Brain Computer Interface applications.