AITopics

2502.03686

Country:

Europe (0.46)
North America > United States > California (0.28)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

arXiv.org Artificial IntelligenceOct-25-2024

TRADE: Transfer of Distributions between External Conditions with Normalizing Flows

Wahl, Stefan, Rousselot, Armand, Draxler, Felix, Köthe, Ullrich

Modeling distributions that depend on external control parameters is a common scenario in diverse applications like molecular simulations, where system properties like temperature affect molecular configurations. Despite the relevance of these applications, existing solutions are unsatisfactory as they require severely restricted model architectures or rely on backward training, which is prone to unstable training. We introduce TRADE, which overcomes these limitations by formulating the learning process as a boundary value problem. By initially training the model for a specific condition using either i.i.d. samples or backward KL training, we establish a boundary distribution. We then propagate this information across other conditions using the gradient of the unnormalized density with respect to the external parameter. This formulation, akin to the principles of physics-informed neural networks, allows us to efficiently learn parameter-dependent distributions without restrictive assumptions. Experimentally, we demonstrate that TRADE achieves excellent results in a wide range of applications, ranging from Bayesian inference and molecular simulations to physical lattice models.

artificial intelligence, grad, machine learning, (17 more...)

2410.19492

Country: Europe (0.46)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

arXiv.org Artificial IntelligenceFeb-9-2024

On the Universality of Coupling-based Normalizing Flows

Draxler, Felix, Wahl, Stefan, Schnörr, Christoph, Köthe, Ullrich

We present a novel theoretical framework for understanding the expressive power of coupling-based normalizing flows such as RealNVP. Despite their prevalence in scientific applications, a comprehensive understanding of coupling flows remains elusive due to their restricted architectures. Existing theorems fall short as they require the use of arbitrarily ill-conditioned neural networks, limiting practical applicability. Additionally, we demonstrate that these constructions inherently lead to volume-preserving flows, a property which we show to be a fundamental constraint for expressivity. We propose a new distributional universality theorem for coupling-based normalizing flows, which overcomes several limitations of prior work. Our results support the general wisdom that the coupling architecture is expressive and provide a nuanced view for choosing the expressivity of coupling functions, bridging a gap between empirical results and theoretical understanding.

affine, artificial intelligence, machine learning, (14 more...)

2402.06578

Country:

Europe (0.46)
North America > United States > New York > New York County > New York City (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.87)

arXiv.org Artificial IntelligenceDec-19-2023

Lifting Architectural Constraints of Injective Flows

Sorrenson, Peter, Draxler, Felix, Rousselot, Armand, Hummerich, Sander, Zimmermann, Lea, Köthe, Ullrich

Generative modeling is one of the most important tasks in machine learning, having numerous applications across vision (Rombach et al., 2022), language modeling (Brown et al., 2020), science (Ardizzone et al., 2018; Radev et al., 2020) and beyond. One of the best-motivated approaches to generative modeling is maximum likelihood training, due to its favorable statistical properties (Hastie et al., 2009). In the continuous setting, exact maximum likelihood training is most commonly achieved by normalizing flows (Rezende & Mohamed, 2015; Dinh et al., 2014; Kobyzev et al., 2020) which parameterize an exactly invertible function with a tractable change of variables (log-determinant term). This generally introduces a trade-off between model expressivity and computational cost, where the cheapest networks to train and sample from, such as coupling block architectures, require very specifically constructed functions which may limit expressivity (Draxler et al., 2022). In addition, normalizing flows preserve the dimensionality of the inputs, requiring a latent space of the same dimension as the data space.

artificial intelligence, machine learning, manifold, (19 more...)

2306.01843

Country: Europe > Germany (0.28)

Genre: Research Report (0.64)

Industry: Energy > Oil & Gas > Upstream (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.69)

arXiv.org Machine LearningDec-15-2023

Learning Distributions on Manifolds with Free-form Flows

Sorrenson, Peter, Draxler, Felix, Rousselot, Armand, Hummerich, Sander, Köthe, Ullrich

Many real world data, particularly in the natural sciences and computer vision, lie on known Riemannian manifolds such as spheres, tori or the group of rotation matrices. The predominant approaches to learning a distribution on such a manifold require solving a differential equation in order to sample from the model and evaluate densities. The resulting sampling times are slowed down by a high number of function evaluations. In this work, we propose an alternative approach which only requires a single function evaluation followed by a projection to the manifold. Training is achieved by an adaptation of the recently proposed free-form flow framework to Riemannian manifolds. The central idea is to estimate the gradient of the negative log-likelihood via a trace evaluated in the tangent space. We evaluate our method on various manifolds, and find significantly faster inference at competitive performance compared to previous work. We make our code public at https://github.com/vislearn/FFF.

artificial intelligence, machine learning, manifold, (17 more...)

2312.09852

Country: North America > United States (0.93)

Genre: Research Report (1.00)

Industry:

Government > Regional Government > North America Government > United States Government (0.46)
Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

arXiv.org Machine LearningOct-25-2023

Free-form Flows: Make Any Architecture a Normalizing Flow

Draxler, Felix, Sorrenson, Peter, Zimmermann, Lea, Rousselot, Armand, Köthe, Ullrich

Normalizing Flows are generative models that directly maximize the likelihood. Previously, the design of normalizing flows was largely constrained by the need for analytical invertibility. We overcome this constraint by a training procedure that uses an efficient estimator for the gradient of the change of variables formula. This enables any dimension-preserving neural network to serve as a generative model through maximum likelihood training. Our approach allows placing the emphasis on tailoring inductive biases precisely to the task at hand. Specifically, we achieve excellent results in molecule generation benchmarks utilizing $E(n)$-equivariant networks. Moreover, our method is competitive in an inverse problem benchmark, while employing off-the-shelf ResNet architectures.

artificial intelligence, critical point, machine learning, (16 more...)

2310.16624

Country: Europe > Germany (0.28)

Genre:

Overview (0.67)
Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

arXiv.org Artificial IntelligenceJun-23-2023

On the Convergence Rate of Gaussianization with Random Rotations

Draxler, Felix, Kühmichel, Lars, Rousselot, Armand, Müller, Jens, Schnörr, Christoph, Köthe, Ullrich

Gaussianization is a simple generative model that can be trained without backpropagation. It has shown compelling performance on low dimensional data. As the dimension increases, however, it has been observed that the convergence speed slows down. We show analytically that the number of required layers scales linearly with the dimension for Gaussian input. We argue that this is because the model is unable to capture dependencies between dimensions. Empirically, we find the same linear increase in cost for arbitrary input $p(x)$, but observe favorable scaling for some distributions. We explore potential speed-ups and formulate challenges for further research.

artificial intelligence, gaussianization, machine learning, (16 more...)

2306.1352

Country: North America > United States > Hawaii (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Artificial IntelligenceJun-21-2023

Finding Competence Regions in Domain Generalization

Müller, Jens, Radev, Stefan T., Schmier, Robert, Draxler, Felix, Rother, Carsten, Köthe, Ullrich

We investigate a "learning to reject" framework to address the problem of silent failures in Domain Generalization (DG), where the test distribution differs from the training distribution. Assuming a mild distribution shift, we wish to accept out-of-distribution (OOD) data from a new domain whenever a model's estimated competence foresees trustworthy responses, instead of rejecting OOD data outright. Trustworthiness is then predicted via a proxy incompetence score that is tightly linked to the performance of a classifier. We present a comprehensive experimental evaluation of existing proxy scores as incompetence scores for classification and highlight the resulting trade-offs between rejection rate and accuracy gain. For comparability with prior work, we focus on standard DG benchmarks and consider the effect of measuring incompetence via different learned representations in a closed versus an open world setting. Our results suggest that increasing incompetence scores are indeed predictive of reduced accuracy, leading to significant improvements of the average accuracy below a suitable incompetence threshold. However, the scores are not yet good enough to allow for a favorable accuracy/rejection trade-off in all tested domains. Surprisingly, our results also indicate that classifiers optimized for DG robustness do not outperform a naive Empirical Risk Minimization (ERM) baseline in the competence region, that is, where test samples elicit low incompetence scores.

accuracy, machine learning, natural language, (20 more...)

2303.09989

Country: North America (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

arXiv.org Machine LearningJun-22-2018

On the Spectral Bias of Deep Neural Networks

Rahaman, Nasim, Arpit, Devansh, Baratin, Aristide, Draxler, Felix, Lin, Min, Hamprecht, Fred A., Bengio, Yoshua, Courville, Aaron

It is well known that over-parametrized deep neural networks (DNNs) are an overly expressive class of functions that can memorize even random data with $100\%$ training accuracy. This raises the question why they do not easily overfit real data. To answer this question, we study deep networks using Fourier analysis. We show that deep networks with finite weights (or trained for finite number of steps) are inherently biased towards representing smooth functions over the input space. Specifically, the magnitude of a particular frequency component ($k$) of deep ReLU network function decays at least as fast as $\mathcal{O}(k^{-2})$, with width and depth helping polynomially and exponentially (respectively) in modeling higher frequencies. This shows for instance why DNNs cannot perfectly \textit{memorize} peaky delta-like functions. We also show that DNNs can exploit the geometry of low dimensional data manifolds to approximate complex functions that exist along the manifold with simple functions when seen with respect to the input space. As a consequence, we find that all samples (including adversarial samples) classified by a network to belong to a certain class are connected by a path such that the prediction of the network along that path does not change. Finally we find that DNN parameters corresponding to functions with higher frequency components occupy a smaller volume in the parameter.

deep learning, frequency, neural network, (18 more...)

1806.08734

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningMar-22-2018

Essentially No Barriers in Neural Network Energy Landscape

Draxler, Felix, Veschgini, Kambis, Salmhofer, Manfred, Hamprecht, Fred A.

Training neural networks involves finding minima of a high-dimensional non-convex loss function. Knowledge of the structure of this energy landscape is sparse. Relaxing from linear interpolations, we construct continuous paths between minima of recent neural network architectures on CIFAR10 and CIFAR100. Surprisingly, the paths are essentially flat in both the training and test landscapes. This implies that neural networks have enough capacity for structural changes, or that these changes are small between minima. Also, each minimum has at least one vanishing Hessian eigenvalue in addition to those resulting from trivial invariance.

deep learning, minima, neural network, (16 more...)

1803.00885

Country:

Europe > Germany (0.14)
Oceania > Australia (0.14)
Europe > Sweden (0.14)
Europe > Spain (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)