Charpiat, Guillaume
Growth strategies for arbitrary DAG neural architectures
Douka, Stella, Verbockhaven, Manon, Rudkiewicz, Théo, Rivaud, Stéphane, Landes, François P., Chevallier, Sylvain, Charpiat, Guillaume
Deep learning has shown impressive results obtained at the cost of training huge neural networks. However, the larger the architecture, the higher the computational, financial, and environmental costs during training and inference. We aim at reducing both training and inference durations. We focus on Neural Architecture Growth, which can increase the size of a small model when needed, directly during training using information from the backpropagation. We expand existing work and freely grow neural networks in the form of any Directed Acyclic Graph by reducing expressivity bottlenecks in the architecture. We explore strategies to reduce excessive computations and steer network growth toward more parameter-efficient architectures.
Neural DDEs with Learnable Delays for Partially Observed Dynamical Systems
Monsel, Thibault, Menier, Emmanuel, Semeraro, Onofrio, Mathelin, Lionel, Charpiat, Guillaume
Many successful methods to learn dynamical systems from data have recently been introduced. Such methods often rely on the availability of the system's full state. However, this underlying hypothesis is rather restrictive as it is typically not confirmed in practice, leaving us with partially observed systems. Utilizing the Mori-Zwanzig (MZ) formalism from statistical physics, we demonstrate that Constant Lag Neural Delay Differential Equations (ND-DEs) naturally serve as suitable models for partially observed states. In empirical evaluation, we show that such models outperform existing methods on both synthetic and experimental data.
Growing Tiny Networks: Spotting Expressivity Bottlenecks and Fixing Them Optimally
Verbockhaven, Manon, Chevallier, Sylvain, Charpiat, Guillaume
Machine learning tasks are generally formulated as optimization problems, where one searches for an optimal function within a certain functional space. In practice, parameterized functional spaces are considered, in order to be able to perform gradient descent. Typically, a neural network architecture is chosen and fixed, and its parameters (connection weights) are optimized, yielding an architecture-dependent result. This way of proceeding however forces the evolution of the function during training to lie within the realm of what is expressible with the chosen architecture, and prevents any optimization across architectures. Costly architectural hyper-parameter optimization is often performed to compensate for this. Instead, we propose to adapt the architecture on the fly during training. We show that the information about desirable architectural changes, due to expressivity bottlenecks when attempting to follow the functional gradient, can be extracted from %the backpropagation. To do this, we propose a mathematical definition of expressivity bottlenecks, which enables us to detect, quantify and solve them while training, by adding suitable neurons when and where needed. Thus, while the standard approach requires large networks, in terms of number of neurons per layer, for expressivity and optimization reasons, we are able to start with very small neural networks and let them grow appropriately. As a proof of concept, we show results~on the CIFAR dataset, matching large neural network accuracy, with competitive training time, while removing the need for standard architectural hyper-parameter search.
Multi-Level GNN Preconditioner for Solving Large Scale Problems
Nastorg, Matthieu, Gratien, Jean-Marc, Faney, Thibault, Bucci, Michele Alessandro, Charpiat, Guillaume, Schoenauer, Marc
Large-scale numerical simulations often come at the expense of daunting computations. High-Performance Computing has enhanced the process, but adapting legacy codes to leverage parallel GPU computations remains challenging. Meanwhile, Machine Learning models can harness GPU computations effectively but often struggle with generalization and accuracy. Graph Neural Networks (GNNs), in particular, are great for learning from unstructured data like meshes but are often limited to small-scale problems. Moreover, the capabilities of the trained model usually restrict the accuracy of the data-driven solution. To benefit from both worlds, this paper introduces a novel preconditioner integrating a GNN model within a multi-level Domain Decomposition framework. The proposed GNN-based preconditioner is used to enhance the efficiency of a Krylov method, resulting in a hybrid solver that can converge with any desired level of accuracy. The efficiency of the Krylov method greatly benefits from the GNN preconditioner, which is adaptable to meshes of any size and shape, is executed on GPUs, and features a multi-level approach to enforce the scalability of the entire process. Several experiments are conducted to validate the numerical behavior of the hybrid solver, and an in-depth analysis of its performance is proposed to assess its competitiveness against a C++ legacy solver.
Rotation-equivariant Graph Neural Networks for Learning Glassy Liquids Representations
Pezzicoli, Francesco Saverio, Charpiat, Guillaume, Landes, François P.
Within the glassy liquids community, the use of Machine Learning (ML) to model particles' static structure is currently a hot topic. The state of the art consists in Graph Neural Networks (GNNs), which have a great expressive power but are heavy models with numerous parameters and lack interpretability. Inspired by recent advances in the field of Machine Learning group-equivariant representations, we build a GNN that learns a robust representation of the glass' static structure by constraining it to preserve the roto-translation (SE(3)) equivariance. We show that this constraint not only significantly improves the predictive power but also improves the ability to generalize to unseen temperatures while allowing to reduce the number of parameters. Furthermore, interpretability is improved, as we can relate the action of our basic convolution layer to well-known rotation-invariant expert features. Through transfer-learning experiments we demonstrate that our network learns a robust representation, which allows us to push forward the idea of a learned glass structural order parameter.
Neural State-Dependent Delay Differential Equations
Monsel, Thibault, Semeraro, Onofrio, Mathelin, Lionel, Charpiat, Guillaume
Discontinuities and delayed terms are encountered in the governing equations of a large class of problems ranging from physics, engineering, medicine to economics. These systems are impossible to be properly modelled and simulated with standard Ordinary Differential Equations (ODE), or any data-driven approximation including Neural Ordinary Differential Equations (NODE). To circumvent this issue, latent variables are typically introduced to solve the dynamics of the system in a higher dimensional space and obtain the solution as a projection to the original space. However, this solution lacks physical interpretability. In contrast, Delay Differential Equations (DDEs) and their data-driven, approximated counterparts naturally appear as good candidates to characterize such complicated systems. In this work we revisit the recently proposed Neural DDE by introducing Neural State-Dependent DDE (SDDDE), a general and flexible framework featuring multiple and state-dependent delays. The developed framework is auto-differentiable and runs efficiently on multiple backends. We show that our method is competitive and outperforms other continuous-class models on a wide variety of delayed dynamical systems.
An Implicit GNN Solver for Poisson-like problems
Nastorg, Matthieu, Bucci, Michele-Alessandro, Faney, Thibault, Gratien, Jean-Marc, Charpiat, Guillaume, Schoenauer, Marc
This paper presents $\Psi$-GNN, a novel Graph Neural Network (GNN) approach for solving the ubiquitous Poisson PDE problems with mixed boundary conditions. By leveraging the Implicit Layer Theory, $\Psi$-GNN models an ''infinitely'' deep network, thus avoiding the empirical tuning of the number of required Message Passing layers to attain the solution. Its original architecture explicitly takes into account the boundary conditions, a critical prerequisite for physical applications, and is able to adapt to any initially provided solution. $\Psi$-GNN is trained using a ''physics-informed'' loss, and the training process is stable by design, and insensitive to its initialization. Furthermore, the consistency of the approach is theoretically proven, and its flexibility and generalization efficiency are experimentally demonstrated: the same learned model can accurately handle unstructured meshes of various sizes, as well as different boundary conditions. To the best of our knowledge, $\Psi$-GNN is the first physics-informed GNN-based method that can handle various unstructured domains, boundary conditions and initial solutions while also providing convergence guarantees.
Designing losses for data-free training of normalizing flows on Boltzmann distributions
Felardos, Loris, Hénin, Jérôme, Charpiat, Guillaume
Generating a Boltzmann distribution in high dimension has recently been achieved with Normalizing Flows, which enable fast and exact computation of the generated density, and thus unbiased estimation of expectations. However, current implementations rely on accurate training data, which typically comes from computationally expensive simulations. There is therefore a clear incentive to train models with incomplete or no data by relying solely on the target density, which can be obtained from a physical energy model (up to a constant factor). For that purpose, we analyze the properties of standard losses based on Kullback-Leibler divergences. We showcase their limitations, in particular a strong propensity for mode collapse during optimization on high-dimensional distributions. We then propose strategies to alleviate these issues, most importantly a new loss function well-grounded in theory and with suitable optimization properties. Using as a benchmark the generation of 3D molecular configurations, we show on several tasks that, for the first time, imperfect pre-trained models can be further optimized in the absence of training data.
DS-GPS : A Deep Statistical Graph Poisson Solver (for faster CFD simulations)
Nastorg, Matthieu, Schoenauer, Marc, Charpiat, Guillaume, Faney, Thibault, Gratien, Jean-Marc, Bucci, Michele-Alessandro
This paper proposes a novel Machine Learning-based approach to solve a Poisson problem with mixed boundary conditions. Leveraging Graph Neural Networks, we develop a model able to process unstructured grids with the advantage of enforcing boundary conditions by design. By directly minimizing the residual of the Poisson equation, the model attempts to learn the physics of the problem without the need for exact solutions, in contrast to most previous data-driven processes where the distance with the available solutions is minimized.
DISCO Verification: Division of Input Space into COnvex polytopes for neural network verification
Girard-Satabin, Julien, Varasse, Aymeric, Schoenauer, Marc, Charpiat, Guillaume, Chihani, Zakaria
The impressive results of modern neural networks partly come from their non linear behaviour. Unfortunately, this property makes it very difficult to apply formal verification tools, even if we restrict ourselves to networks with a piecewise linear structure. However, such networks yields subregions that are linear and thus simpler to analyse independently. In this paper, we propose a method to simplify the verification problem by operating a partitionning into multiple linear subproblems. To evaluate the feasibility of such an approach, we perform an empirical analysis of neural networks to estimate the number of linear regions, and compare them to the bounds currently known. We also present the impact of a technique aiming at reducing the number of linear regions during training.