AITopics | Günnemann, Stephan

Collaborating Authors

Günnemann, Stephan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Ewald-based Long-Range Message Passing for Molecular Graphs

Kosmala, Arthur, Gasteiger, Johannes, Gao, Nicholas, Günnemann, Stephan

arXiv.org Artificial IntelligenceJun-6-2023

Neural architectures that learn potential energy surfaces from molecular data have undergone fast improvement in recent years. A key driver of this success is the Message Passing Neural Network (MPNN) paradigm. Its favorable scaling with system size partly relies upon a spatial distance limit on messages. While this focus on locality is a useful inductive bias, it also impedes the learning of long-range interactions such as electrostatics and van der Waals forces. To address this drawback, we propose Ewald message passing: a nonlocal Fourier space scheme which limits interactions via a cutoff on frequency instead of distance, and is theoretically well-founded in the Ewald summation method. It can serve as an augmentation on top of existing MPNN architectures as it is computationally inexpensive and agnostic to architectural details. We test the approach with four baseline models and two datasets containing diverse periodic (OC20) and aperiodic structures (OE62). We observe robust improvements in energy mean absolute errors across all models and datasets, averaging 10% on OC20 and 16% on OE62. Our analysis shows an outsize impact of these improvements on structures with high long-range contributions to the ground truth energy.

artificial intelligence, ewald-based long-range message passing, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2303.04791

Country:

Europe > Germany > Bavaria (0.14)
North America > United States > Hawaii (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Generalizing Neural Wave Functions

Gao, Nicholas, Günnemann, Stephan

arXiv.org Artificial IntelligenceMay-31-2023

Recent neural network-based wave functions have achieved state-of-the-art accuracies in modeling ab-initio ground-state potential energy surface. However, these networks can only solve different spatial arrangements of the same set of atoms. To overcome this limitation, we present Graph-learned orbital embeddings (Globe), a neural network-based reparametrization method that can adapt neural wave functions to different molecules. Globe learns representations of local electronic structures that generalize across molecules via spatial message passing by connecting molecular orbitals to covalent bonds. Further, we propose a size-consistent wave function Ansatz, the Molecular orbital network (Moon), tailored to jointly solve Schr\"odinger equations of different molecules. In our experiments, we find Moon converging in 4.5 times fewer steps to similar accuracy as previous methods or to lower energies given the same time. Further, our analysis shows that Moon's energy estimate scales additively with increased system sizes, unlike previous work where we observe divergence. In both computational chemistry and machine learning, we are the first to demonstrate that a single wave function can solve the Schr\"odinger equation of molecules with different atoms jointly.

artificial intelligence, machine learning, wave function, (17 more...)

arXiv.org Artificial Intelligence

2302.04168

Country: North America > United States > Hawaii (0.14)

Genre: Research Report > New Finding (0.34)

Industry:

Energy (0.67)
Materials > Chemicals (0.46)
Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Modeling Temporal Data as Continuous Functions with Stochastic Process Diffusion

Biloš, Marin, Rasul, Kashif, Schneider, Anderson, Nevmyvaka, Yuriy, Günnemann, Stephan

arXiv.org Artificial IntelligenceMay-19-2023

Temporal data such as time series can be viewed as discretized measurements of the underlying function. To build a generative model for such data we have to model the stochastic process that governs it. We propose a solution by defining the denoising diffusion model in the function space which also allows us to naturally handle irregularly-sampled observations. The forward process gradually adds noise to functions, preserving their continuity, while the learned reverse process removes the noise and returns functions as new samples. To this end, we define suitable noise sources and introduce novel denoising and score-matching models. We show how our method can be used for multivariate probabilistic forecasting and imputation, and how our model can be interpreted as a neural process.

artificial intelligence, machine learning, noise, (17 more...)

arXiv.org Artificial Intelligence

2211.0259

Country:

Europe > Germany (0.28)
North America > United States > Hawaii (0.14)

Genre: Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.64)

Add feedback

Revisiting Robustness in Graph Machine Learning

Gosch, Lukas, Sturm, Daniel, Geisler, Simon, Günnemann, Stephan

arXiv.org Artificial IntelligenceMay-2-2023

Many works show that node-level predictions of Graph Neural Networks (GNNs) are unrobust to small, often termed adversarial, changes to the graph structure. However, because manual inspection of a graph is difficult, it is unclear if the studied perturbations always preserve a core assumption of adversarial examples: that of unchanged semantic content. To address this problem, we introduce a more principled notion of an adversarial graph, which is aware of semantic content change. Using Contextual Stochastic Block Models (CSBMs) and real-world graphs, our results uncover: $i)$ for a majority of nodes the prevalent perturbation models include a large fraction of perturbed graphs violating the unchanged semantics assumption; $ii)$ surprisingly, all assessed GNNs show over-robustness - that is robustness beyond the point of semantic change. We find this to be a complementary phenomenon to adversarial examples and show that including the label-structure of the training graph into the inference process of GNNs significantly reduces over-robustness, while having a positive effect on test accuracy and adversarial robustness. Theoretically, leveraging our new semantics-aware notion of robustness, we prove that there is no robustness-accuracy tradeoff for inductively classifying a newly added node.

artificial intelligence, data mining, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2305.00851

Country:

Asia (0.14)
Europe > Germany (0.14)

Genre: Research Report > New Finding (0.48)

Industry:

Government (0.92)
Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(3 more...)

Add feedback

Towards Efficient MCMC Sampling in Bayesian Neural Networks by Exploiting Symmetry

Wiese, Jonas Gregor, Wimmer, Lisa, Papamarkou, Theodore, Bischl, Bernd, Günnemann, Stephan, Rügamer, David

arXiv.org Artificial IntelligenceApr-6-2023

Bayesian inference in deep neural networks is challenging due to the high-dimensional, strongly multi-modal parameter posterior density landscape. Markov chain Monte Carlo approaches asymptotically recover the true posterior but are considered prohibitively expensive for large modern architectures. Local methods, which have emerged as a popular alternative, focus on specific parameter regions that can be approximated by functions with tractable integrals. While these often yield satisfactory empirical results, they fail, by definition, to account for the multi-modality of the parameter posterior. In this work, we argue that the dilemma between exact-but-unaffordable and cheap-but-inexact approaches can be mitigated by exploiting symmetries in the posterior landscape. Such symmetries, induced by neuron interchangeability and certain activation functions, manifest in different parameter values leading to the same functional output value. We show theoretically that the posterior predictive density in Bayesian neural networks can be restricted to a symmetry-free parameter reference set. By further deriving an upper bound on the number of Monte Carlo chains required to capture the functional diversity, we propose a straightforward approach for feasible Bayesian inference. Our experiments suggest that efficient sampling is indeed possible, opening up a promising path to accurate uncertainty quantification in deep learning.

artificial intelligence, bayesian inference, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2304.02902

Country:

North America > United States > Virginia (0.14)
North America > United States > California (0.14)

Genre: Research Report (1.00)

Add feedback

The power of motifs as inductive bias for learning molecular distributions

Sommer, Johanna, Hetzel, Leon, Lüdke, David, Theis, Fabian, Günnemann, Stephan

arXiv.org Artificial IntelligenceApr-4-2023

Machine learning for molecules holds great potential for efficiently exploring the vast chemical space and thus streamlining the drug discovery process by facilitating the design of new therapeutic molecules. Deep generative models have shown promising results for molecule generation, but the benefits of specific inductive biases for learning distributions over small graphs are unclear. Our study aims to investigate the impact of subgraph structures and vocabulary design on distribution learning, using small drug molecules as a case study. To this end, we introduce Subcover, a new subgraph-based fragmentation scheme, and evaluate it through a two-step variational auto-encoder. Our results show that Subcover's improved identification of chemically meaningful subgraphs leads to a relative improvement of the FCD score by 30%, outperforming previous methods. Our findings highlight the potential of Subcover to enhance the performance and scalability of existing methods, contributing to the advancement of drug discovery. Generative models for molecules offer a way to create new compounds with specific properties, which can be useful in various fields, including drug discovery, material science, and chemistry (Bian & Xie, 2021; Choudhary et al., 2022; Hetzel et al., 2022; Zhu et al., 2022; Du et al., 2022).

artificial intelligence, machine learning, molecule, (16 more...)

arXiv.org Artificial Intelligence

2306.17246

Country: Europe > Germany (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.89)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Accuracy is not the only Metric that matters: Estimating the Energy Consumption of Deep Learning Models

Getzner, Johannes, Charpentier, Bertrand, Günnemann, Stephan

arXiv.org Artificial IntelligenceApr-3-2023

Published as a workshop paper at "Tackling Climate Change with Machine Learning", ICLR 2023 Modern machine learning models have started to consume incredible amounts of energy, thus incurring large carbon footprints (Strubell et al., 2019). We accomplished this, by collecting high-quality energy data and building a first baseline model, capable of predicting the energy consumption of DL models by accumulating their estimated layer-wise energies. Deep CNNs, such as VGG16 or ResNet50 already deliver great performance (Simonyan & Zisserman, 2014; He et al., 2015). Yet the increasing number of layers in such models comes at the cost of severely increased computational complexity, resulting in the need for power-hungry hardware (Thompson et al., 2020; Jin et al., 2016). An example of a model that behaves extremely poorly in this regard is a big transformer with neural architecture search (Strubell et al., 2019). Clearly, training and running these models is not just a matter of financial cost, but also environmental impact.

artificial intelligence, energy consumption, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2304.00897

Country: Europe > Germany (0.14)

Genre: Research Report (0.65)

Industry: Energy (0.90)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Training, Architecture, and Prior for Deterministic Uncertainty Methods

Charpentier, Bertrand, Zhang, Chenxiang, Günnemann, Stephan

arXiv.org Artificial IntelligenceMar-28-2023

Accurate and efficient uncertainty estimation is crucial to build reliable Machine Learning (ML) models capable to provide calibrated uncertainty estimates, generalize and detect Out-Of-Distribution (OOD) datasets. To this end, Deterministic Uncertainty Methods (DUMs) is a promising model family capable to perform uncertainty estimation in a single forward pass. This work investigates important design choices in DUMs: (1) we show that training schemes decoupling the core architecture and the uncertainty head schemes can significantly improve uncertainty performances. Safety is critical to the adoption of deep learning in domains such as autonomous driving, medical diagnosis, or financial trading systems. A solution for this problem is to create reliable models capable to estimate the uncertainty of its own predictions. Different uncertainty types are divided in aleatoric uncertainty quantified by the inherited noise in the data, thus irreducible; epistemic uncertainty quantified by the modeling choice or lack of data, thus reducible; predictive uncertainty, a combination of aleatoric and epistemic (Gal, 2016). In practice, high quality uncertainty estimates must be calibrated and able to detect Out-Of-Distribution (OOD) data like anomalies while preserving good Out-Of-Distribution (OOD) generalization performances like on dataset shifts. Recently, a family of methods for uncertainty estimation named Deterministic Uncertainty Methods (DUMs) have emerged (Postels et al., 2022). Contrary to uncertainty methods such as Ensembles (Lakshminarayanan et al., 2017), MC Dropout (Gal & Ghahramani, 2016) or other Bayesian neural networks on weights (Blundell et al., 2015), which require multiple forward passes to make predictions, DUMs only require a single forward pass, thus making them significantly more computationally efficient.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2303.05796

Country: Europe > Germany (0.14)

Genre: Research Report > Promising Solution (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Sampling-free Inference for Ab-Initio Potential Energy Surface Networks

Gao, Nicholas, Günnemann, Stephan

arXiv.org Artificial IntelligenceMar-6-2023

Recently, it has been shown that neural networks not only approximate the ground-state wave functions of a single molecular system well but can also generalize to multiple geometries. While such generalization significantly speeds up training, each energy evaluation still requires Monte Carlo integration which limits the evaluation to a few geometries. In this work, we address the inference shortcomings by proposing the Potential learning from ab-initio Networks (PlaNet) framework, in which we simultaneously train a surrogate model in addition to the neural wave function. At inference time, the surrogate avoids expensive Monte-Carlo integration by directly estimating the energy, accelerating the process from hours to milliseconds. In this way, we can accurately model high-resolution multi-dimensional energy surfaces for larger systems that previously were unobtainable via neural wave functions. Finally, we explore an additional inductive bias by introducing physically-motivated restricted neural wave function models. We implement such a function with several additional improvements in the new PESNet++ model. In our experimental evaluation, PlaNet accelerates inference by 7 orders of magnitude for larger molecules like ethanol while preserving accuracy. Compared to previous energy surface networks, PESNet++ reduces energy errors by up to 74%.

artificial intelligence, energy surface, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2205.14962

Country: Europe > Germany (0.14)

Genre: Research Report (0.50)

Industry: Materials > Chemicals > Commodity Chemicals > Petrochemicals (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.90)

Add feedback

Localized Randomized Smoothing for Collective Robustness Certification

Schuchardt, Jan, Wollschläger, Tom, Bojchevski, Aleksandar, Günnemann, Stephan

arXiv.org Artificial IntelligenceMar-3-2023

Models for image segmentation, node classification and many other tasks map a single input to multiple labels. By perturbing this single shared input (e.g. the image) an adversary can manipulate several predictions (e.g. misclassify several pixels). Collective robustness certification is the task of provably bounding the number of robust predictions under this threat model. The only dedicated method that goes beyond certifying each output independently is limited to strictly local models, where each prediction is associated with a small receptive field. We propose a more general collective robustness certificate for all types of models. We further show that this approach is beneficial for the larger class of softly local models, where each output is dependent on the entire input but assigns different levels of importance to different input regions (e.g. based on their proximity in the image). The certificate is based on our novel localized randomized smoothing approach, where the random perturbation strength for different input regions is proportional to their importance for the outputs. Localized smoothing Pareto-dominates existing certificates on both image segmentation and node classification tasks, simultaneously offering higher accuracy and stronger certificates.

artificial intelligence, certificate, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2210.1614

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback