Azizpour, Hossein
Hessian-Informed Flow Matching
Sprague, Christopher Iliffe, Elofsson, Arne, Azizpour, Hossein
Modeling complex systems that evolve toward equilibrium distributions is important in various physical applications, including molecular dynamics and robotic control. These systems often follow the stochastic gradient descent of an underlying energy function, converging to stationary distributions around energy minima. The local covariance of these distributions is shaped by the energy landscape's curvature, often resulting in anisotropic characteristics. While flow-based generative models have gained traction in generating samples from equilibrium distributions in such applications, they predominately employ isotropic conditional probability paths, limiting their ability to capture such covariance structures. In this paper, we introduce Hessian-Informed Flow Matching (HI-FM), a novel approach that integrates the Hessian of an energy function into conditional flows within the flow matching framework. This integration allows HI-FM to account for local curvature and anisotropic covariance structures. Our approach leverages the linearization theorem from dynamical systems and incorporates additional considerations such as time transformations and equivariance. Empirical evaluations on the MNIST and Lennard-Jones particles datasets demonstrate that HI-FM improves the likelihood of test samples.
Opportunities for machine learning in scientific discovery
Vinuesa, Ricardo, Rabault, Jean, Azizpour, Hossein, Bauer, Stefan, Brunton, Bingni W., Elofsson, Arne, Jarlebring, Elias, Kjellstrom, Hedvig, Markidis, Stefano, Marlevi, David, Cinnella, Paola, Brunton, Steven L.
Technological advancements have substantially increased computational power and data availability, enabling the application of powerful machine-learning (ML) techniques across various fields. However, our ability to leverage ML methods for scientific discovery, {\it i.e.} to obtain fundamental and formalized knowledge about natural processes, is still in its infancy. In this review, we explore how the scientific community can increasingly leverage ML techniques to achieve scientific discoveries. We observe that the applicability and opportunity of ML depends strongly on the nature of the problem domain, and whether we have full ({\it e.g.}, turbulence), partial ({\it e.g.}, computational biochemistry), or no ({\it e.g.}, neuroscience) {\it a-priori} knowledge about the governing equations and physical properties of the system. Although challenges remain, principled use of ML is opening up new avenues for fundamental scientific discoveries. Throughout these diverse fields, there is a theme that ML is enabling researchers to embrace complexity in observational data that was previously intractable to classic analysis and numerical investigations.
Indirectly Parameterized Concrete Autoencoders
Nilsson, Alfred, Wijk, Klas, Gutha, Sai bharath chandra, Englesson, Erik, Hotti, Alexandra, Saccardi, Carlo, Kviman, Oskar, Lagergren, Jens, Vinuesa, Ricardo, Azizpour, Hossein
Feature selection is a crucial task in settings where data is high-dimensional or acquiring the full set of features is costly. Recent developments in neural network-based embedded feature selection show promising results across a wide range of applications. Concrete Autoencoders (CAEs), considered state-of-the-art in embedded feature selection, may struggle to achieve stable joint optimization, hurting their training time and generalization. In this work, we identify that this instability is correlated with the CAE learning duplicate selections. To remedy this, we propose a simple and effective improvement: Indirectly Parameterized CAEs (IP-CAEs). IP-CAEs learn an embedding and a mapping from it to the Gumbel-Softmax distributions' parameters. Despite being simple to implement, IP-CAE exhibits significant and consistent improvements over CAE in both generalization and training time across several datasets for reconstruction and classification. Unlike CAE, IP-CAE effectively leverages non-linear relationships and does not require retraining the jointly optimized decoder. Furthermore, our approach is, in principle, generalizable to Gumbel-Softmax distributions beyond feature selection.
Stable Autonomous Flow Matching
Sprague, Christopher Iliffe, Elofsson, Arne, Azizpour, Hossein
In contexts where data samples represent a physically stable state, it is often assumed that the data points represent the local minima of an energy landscape. In control theory, it is well-known that energy can serve as an effective Lyapunov function. Despite this, connections between control theory and generative models in the literature are sparse, even though there are several machine learning applications with physically stable data points. In this paper, we focus on such data and a recent class of deep generative models called flow matching. We apply tools of stochastic stability for time-independent systems to flow matching models. In doing so, we characterize the space of flow matching models that are amenable to this treatment, as well as draw connections to other control theory principles. We demonstrate our theoretical results on two examples.
On the Lipschitz Constant of Deep Networks and Double Descent
Gamba, Matteo, Azizpour, Hossein, Bjรถrkman, Mรฅrten
A longstanding question towards understanding the remarkable generalization ability of deep networks is characterizing the hypothesis class of models trained in practice, thus isolating properties of the networks' model function that capture generalization (Hanin & Rolnick, 2019; Neyshabur et al., 2015). Chiefly, a central problem is understanding the role played by overparameterization (Arora et al., 2018; Neyshabur et al., 2018; Zhang et al., 2018) - a key design choice of state of the art models - in promoting regularization of the model function. Modern overparameterized networks can achieve good generalization while perfectly interpolating the training set (Nakkiran et al., 2019). This phenomenon is described by the double descent curve of the test error (Belkin et al., 2019; Geiger et al., 2019): as model size increases, the error follows the classical bias-variance trade-off curve (Geman et al., 1992), peaks when a model is large enough to interpolate the training data, and then decreases again as model size grows further.
Logistic-Normal Likelihoods for Heteroscedastic Label Noise
Englesson, Erik, Mehrpanah, Amir, Azizpour, Hossein
A natural way of estimating heteroscedastic label noise in regression is to model the observed (potentially noisy) target as a sample from a normal distribution, whose parameters can be learned by minimizing the negative log-likelihood. This formulation has desirable loss attenuation properties, as it reduces the contribution of high-error examples. Intuitively, this behavior can improve robustness against label noise by reducing overfitting. We propose an extension of this simple and probabilistic approach to classification that has the same desirable loss attenuation properties. Furthermore, we discuss and address some practical challenges of this extension. We evaluate the effectiveness of the method by measuring its robustness against label noise in classification. We perform enlightening experiments exploring the inner workings of the method, including sensitivity to hyperparameters, ablation studies, and other insightful analyses.
Hyperplane Arrangements of Trained ConvNets Are Biased
Gamba, Matteo, Carlsson, Stefan, Azizpour, Hossein, Bjรถrkman, Mรฅrten
In recent years, understanding and interpreting the inner workings of deep networks has drawn considerable attention from the community [7, 15, 16, 13]. One long-standing question is the problem of identifying the inductive bias of state-of-the-art networks and the form of implicit regularization that is performed by the optimizer [22, 31, 2] and possibly by natural data itself [3]. While earlier studies focused on the theoretical expressivity of deep networks and the advantage of deeper representations [20, 25, 26], a recent trend in the literature is the study of the effective capacity of trained networks [31, 32, 9, 10]. In fact, while state-of-the-art deep networks are largely overparametrized, it is hypothesized that the full theoretical capacity of a model might not be realized in practice, due to some form of self-regulation at play during learning. Some recent works have, thus, tried to find statistical bias consistently present in trained state-of-the-art models that is interpretable and correlates well with generalization [14, 24]. In this work, we take a geometrical perspective and look for statistical bias in the weights of trained convolutional networks, in terms of hyperplane arrangements induced by convolutional layers with ReLU activations.
Deep Double Descent via Smooth Interpolation
Gamba, Matteo, Englesson, Erik, Bjรถrkman, Mรฅrten, Azizpour, Hossein
The ability of overparameterized deep networks to interpolate noisy data, while at the same time showing good generalization performance, has been recently characterized in terms of the double descent curve for the test error. Common intuition from polynomial regression suggests that overparameterized networks are able to sharply interpolate noisy data, without considerably deviating from the ground-truth signal, thus preserving generalization ability. At present, a precise characterization of the relationship between interpolation and generalization for deep networks is missing. In this work, we quantify sharpness of fit of the training data interpolated by neural network functions, by studying the loss landscape w.r.t. to the input variable locally to each training point, over volumes around cleanly- and noisily-labelled training samples, as we systematically increase the number of model parameters and training epochs. Our findings show that loss sharpness in the input space follows both model- and epoch-wise double descent, with worse peaks observed around noisy labels. While small interpolating models sharply fit both clean and noisy data, large interpolating models express a smooth loss landscape, where noisy targets are predicted over large volumes around training data points, in contrast to existing intuition.
Predicting the wall-shear stress and wall pressure through convolutional neural networks
Balasubramanian, Arivazhagan G., Guastoni, Luca, Schlatter, Philipp, Azizpour, Hossein, Vinuesa, Ricardo
The objective of this study is to assess the capability of convolution-based neural networks to predict wall quantities in a turbulent open channel flow. The first tests are performed by training a fully-convolutional network (FCN) to predict the 2D velocity-fluctuation fields at the inner-scaled wall-normal location $y^{+}_{\rm target}$, using the sampled velocity fluctuations in wall-parallel planes located farther from the wall, at $y^{+}_{\rm input}$. The predictions from the FCN are compared against the predictions from a proposed R-Net architecture. Since the R-Net model is found to perform better than the FCN model, the former architecture is optimized to predict the 2D streamwise and spanwise wall-shear-stress components and the wall pressure from the sampled velocity-fluctuation fields farther from the wall. The dataset is obtained from DNS of open channel flow at $Re_{\tau} = 180$ and $550$. The turbulent velocity-fluctuation fields are sampled at various inner-scaled wall-normal locations, along with the wall-shear stress and the wall pressure. At $Re_{\tau}=550$, both FCN and R-Net can take advantage of the self-similarity in the logarithmic region of the flow and predict the velocity-fluctuation fields at $y^{+} = 50$ using the velocity-fluctuation fields at $y^{+} = 100$ as input with about 10% error in prediction of streamwise-fluctuations intensity. Further, the R-Net is also able to predict the wall-shear-stress and wall-pressure fields using the velocity-fluctuation fields at $y^+ = 50$ with around 10% error in the intensity of the corresponding fluctuations at both $Re_{\tau} = 180$ and $550$. These results are an encouraging starting point to develop neural-network-based approaches for modelling turbulence near the wall in large-eddy simulations.
Consistency Regularization Can Improve Robustness to Label Noise
Englesson, Erik, Azizpour, Hossein
Consistency regularization is a commonly-used technique for semi-supervised and self-supervised learning. It is an auxiliary objective function that encourages the prediction of the network to be similar in the vicinity of the observed training samples. Hendrycks et al. (2020) have recently shown such regularization naturally brings test-time robustness to corrupted data and helps with calibration. This paper empirically studies the relevance of consistency regularization for training-time robustness to noisy labels. First, we make two interesting and useful observations regarding the consistency of networks trained with the standard cross entropy loss on noisy datasets which are: (i) networks trained on noisy data have lower consistency than those trained on clean data, and(ii) the consistency reduces more significantly around noisy-labelled training data points than correctly-labelled ones. Then, we show that a simple loss function that encourages consistency improves the robustness of the models to label noise on both synthetic (CIFAR-10, CIFAR-100) and real-world (WebVision) noise as well as different noise rates and types and achieves state-of-the-art results.