Günnemann, Stephan
Ewald-based Long-Range Message Passing for Molecular Graphs
Kosmala, Arthur, Gasteiger, Johannes, Gao, Nicholas, Günnemann, Stephan
Neural architectures that learn potential energy surfaces from molecular data have undergone fast improvement in recent years. A key driver of this success is the Message Passing Neural Network (MPNN) paradigm. Its favorable scaling with system size partly relies upon a spatial distance limit on messages. While this focus on locality is a useful inductive bias, it also impedes the learning of long-range interactions such as electrostatics and van der Waals forces. To address this drawback, we propose Ewald message passing: a nonlocal Fourier space scheme which limits interactions via a cutoff on frequency instead of distance, and is theoretically well-founded in the Ewald summation method. It can serve as an augmentation on top of existing MPNN architectures as it is computationally inexpensive and agnostic to architectural details. We test the approach with four baseline models and two datasets containing diverse periodic (OC20) and aperiodic structures (OE62). We observe robust improvements in energy mean absolute errors across all models and datasets, averaging 10% on OC20 and 16% on OE62. Our analysis shows an outsize impact of these improvements on structures with high long-range contributions to the ground truth energy.
Generalizing Neural Wave Functions
Gao, Nicholas, Günnemann, Stephan
Recent neural network-based wave functions have achieved state-of-the-art accuracies in modeling ab-initio ground-state potential energy surface. However, these networks can only solve different spatial arrangements of the same set of atoms. To overcome this limitation, we present Graph-learned orbital embeddings (Globe), a neural network-based reparametrization method that can adapt neural wave functions to different molecules. Globe learns representations of local electronic structures that generalize across molecules via spatial message passing by connecting molecular orbitals to covalent bonds. Further, we propose a size-consistent wave function Ansatz, the Molecular orbital network (Moon), tailored to jointly solve Schr\"odinger equations of different molecules. In our experiments, we find Moon converging in 4.5 times fewer steps to similar accuracy as previous methods or to lower energies given the same time. Further, our analysis shows that Moon's energy estimate scales additively with increased system sizes, unlike previous work where we observe divergence. In both computational chemistry and machine learning, we are the first to demonstrate that a single wave function can solve the Schr\"odinger equation of molecules with different atoms jointly.
Modeling Temporal Data as Continuous Functions with Stochastic Process Diffusion
Biloš, Marin, Rasul, Kashif, Schneider, Anderson, Nevmyvaka, Yuriy, Günnemann, Stephan
Temporal data such as time series can be viewed as discretized measurements of the underlying function. To build a generative model for such data we have to model the stochastic process that governs it. We propose a solution by defining the denoising diffusion model in the function space which also allows us to naturally handle irregularly-sampled observations. The forward process gradually adds noise to functions, preserving their continuity, while the learned reverse process removes the noise and returns functions as new samples. To this end, we define suitable noise sources and introduce novel denoising and score-matching models. We show how our method can be used for multivariate probabilistic forecasting and imputation, and how our model can be interpreted as a neural process.
Revisiting Robustness in Graph Machine Learning
Gosch, Lukas, Sturm, Daniel, Geisler, Simon, Günnemann, Stephan
Many works show that node-level predictions of Graph Neural Networks (GNNs) are unrobust to small, often termed adversarial, changes to the graph structure. However, because manual inspection of a graph is difficult, it is unclear if the studied perturbations always preserve a core assumption of adversarial examples: that of unchanged semantic content. To address this problem, we introduce a more principled notion of an adversarial graph, which is aware of semantic content change. Using Contextual Stochastic Block Models (CSBMs) and real-world graphs, our results uncover: $i)$ for a majority of nodes the prevalent perturbation models include a large fraction of perturbed graphs violating the unchanged semantics assumption; $ii)$ surprisingly, all assessed GNNs show over-robustness - that is robustness beyond the point of semantic change. We find this to be a complementary phenomenon to adversarial examples and show that including the label-structure of the training graph into the inference process of GNNs significantly reduces over-robustness, while having a positive effect on test accuracy and adversarial robustness. Theoretically, leveraging our new semantics-aware notion of robustness, we prove that there is no robustness-accuracy tradeoff for inductively classifying a newly added node.
Towards Efficient MCMC Sampling in Bayesian Neural Networks by Exploiting Symmetry
Wiese, Jonas Gregor, Wimmer, Lisa, Papamarkou, Theodore, Bischl, Bernd, Günnemann, Stephan, Rügamer, David
Bayesian inference in deep neural networks is challenging due to the high-dimensional, strongly multi-modal parameter posterior density landscape. Markov chain Monte Carlo approaches asymptotically recover the true posterior but are considered prohibitively expensive for large modern architectures. Local methods, which have emerged as a popular alternative, focus on specific parameter regions that can be approximated by functions with tractable integrals. While these often yield satisfactory empirical results, they fail, by definition, to account for the multi-modality of the parameter posterior. In this work, we argue that the dilemma between exact-but-unaffordable and cheap-but-inexact approaches can be mitigated by exploiting symmetries in the posterior landscape. Such symmetries, induced by neuron interchangeability and certain activation functions, manifest in different parameter values leading to the same functional output value. We show theoretically that the posterior predictive density in Bayesian neural networks can be restricted to a symmetry-free parameter reference set. By further deriving an upper bound on the number of Monte Carlo chains required to capture the functional diversity, we propose a straightforward approach for feasible Bayesian inference. Our experiments suggest that efficient sampling is indeed possible, opening up a promising path to accurate uncertainty quantification in deep learning.
The power of motifs as inductive bias for learning molecular distributions
Sommer, Johanna, Hetzel, Leon, Lüdke, David, Theis, Fabian, Günnemann, Stephan
Machine learning for molecules holds great potential for efficiently exploring the vast chemical space and thus streamlining the drug discovery process by facilitating the design of new therapeutic molecules. Deep generative models have shown promising results for molecule generation, but the benefits of specific inductive biases for learning distributions over small graphs are unclear. Our study aims to investigate the impact of subgraph structures and vocabulary design on distribution learning, using small drug molecules as a case study. To this end, we introduce Subcover, a new subgraph-based fragmentation scheme, and evaluate it through a two-step variational auto-encoder. Our results show that Subcover's improved identification of chemically meaningful subgraphs leads to a relative improvement of the FCD score by 30%, outperforming previous methods. Our findings highlight the potential of Subcover to enhance the performance and scalability of existing methods, contributing to the advancement of drug discovery. Generative models for molecules offer a way to create new compounds with specific properties, which can be useful in various fields, including drug discovery, material science, and chemistry (Bian & Xie, 2021; Choudhary et al., 2022; Hetzel et al., 2022; Zhu et al., 2022; Du et al., 2022).
Accuracy is not the only Metric that matters: Estimating the Energy Consumption of Deep Learning Models
Getzner, Johannes, Charpentier, Bertrand, Günnemann, Stephan
Published as a workshop paper at "Tackling Climate Change with Machine Learning", ICLR 2023 Modern machine learning models have started to consume incredible amounts of energy, thus incurring large carbon footprints (Strubell et al., 2019). We accomplished this, by collecting high-quality energy data and building a first baseline model, capable of predicting the energy consumption of DL models by accumulating their estimated layer-wise energies. Deep CNNs, such as VGG16 or ResNet50 already deliver great performance (Simonyan & Zisserman, 2014; He et al., 2015). Yet the increasing number of layers in such models comes at the cost of severely increased computational complexity, resulting in the need for power-hungry hardware (Thompson et al., 2020; Jin et al., 2016). An example of a model that behaves extremely poorly in this regard is a big transformer with neural architecture search (Strubell et al., 2019). Clearly, training and running these models is not just a matter of financial cost, but also environmental impact.
Training, Architecture, and Prior for Deterministic Uncertainty Methods
Charpentier, Bertrand, Zhang, Chenxiang, Günnemann, Stephan
Accurate and efficient uncertainty estimation is crucial to build reliable Machine Learning (ML) models capable to provide calibrated uncertainty estimates, generalize and detect Out-Of-Distribution (OOD) datasets. To this end, Deterministic Uncertainty Methods (DUMs) is a promising model family capable to perform uncertainty estimation in a single forward pass. This work investigates important design choices in DUMs: (1) we show that training schemes decoupling the core architecture and the uncertainty head schemes can significantly improve uncertainty performances. Safety is critical to the adoption of deep learning in domains such as autonomous driving, medical diagnosis, or financial trading systems. A solution for this problem is to create reliable models capable to estimate the uncertainty of its own predictions. Different uncertainty types are divided in aleatoric uncertainty quantified by the inherited noise in the data, thus irreducible; epistemic uncertainty quantified by the modeling choice or lack of data, thus reducible; predictive uncertainty, a combination of aleatoric and epistemic (Gal, 2016). In practice, high quality uncertainty estimates must be calibrated and able to detect Out-Of-Distribution (OOD) data like anomalies while preserving good Out-Of-Distribution (OOD) generalization performances like on dataset shifts. Recently, a family of methods for uncertainty estimation named Deterministic Uncertainty Methods (DUMs) have emerged (Postels et al., 2022). Contrary to uncertainty methods such as Ensembles (Lakshminarayanan et al., 2017), MC Dropout (Gal & Ghahramani, 2016) or other Bayesian neural networks on weights (Blundell et al., 2015), which require multiple forward passes to make predictions, DUMs only require a single forward pass, thus making them significantly more computationally efficient.
Sampling-free Inference for Ab-Initio Potential Energy Surface Networks
Gao, Nicholas, Günnemann, Stephan
Recently, it has been shown that neural networks not only approximate the ground-state wave functions of a single molecular system well but can also generalize to multiple geometries. While such generalization significantly speeds up training, each energy evaluation still requires Monte Carlo integration which limits the evaluation to a few geometries. In this work, we address the inference shortcomings by proposing the Potential learning from ab-initio Networks (PlaNet) framework, in which we simultaneously train a surrogate model in addition to the neural wave function. At inference time, the surrogate avoids expensive Monte-Carlo integration by directly estimating the energy, accelerating the process from hours to milliseconds. In this way, we can accurately model high-resolution multi-dimensional energy surfaces for larger systems that previously were unobtainable via neural wave functions. Finally, we explore an additional inductive bias by introducing physically-motivated restricted neural wave function models. We implement such a function with several additional improvements in the new PESNet++ model. In our experimental evaluation, PlaNet accelerates inference by 7 orders of magnitude for larger molecules like ethanol while preserving accuracy. Compared to previous energy surface networks, PESNet++ reduces energy errors by up to 74%.
Localized Randomized Smoothing for Collective Robustness Certification
Schuchardt, Jan, Wollschläger, Tom, Bojchevski, Aleksandar, Günnemann, Stephan
Models for image segmentation, node classification and many other tasks map a single input to multiple labels. By perturbing this single shared input (e.g. the image) an adversary can manipulate several predictions (e.g. misclassify several pixels). Collective robustness certification is the task of provably bounding the number of robust predictions under this threat model. The only dedicated method that goes beyond certifying each output independently is limited to strictly local models, where each prediction is associated with a small receptive field. We propose a more general collective robustness certificate for all types of models. We further show that this approach is beneficial for the larger class of softly local models, where each output is dependent on the entire input but assigns different levels of importance to different input regions (e.g. based on their proximity in the image). The certificate is based on our novel localized randomized smoothing approach, where the random perturbation strength for different input regions is proportional to their importance for the outputs. Localized smoothing Pareto-dominates existing certificates on both image segmentation and node classification tasks, simultaneously offering higher accuracy and stronger certificates.