AITopics | Trabs, Mathias

Collaborating Authors

Trabs, Mathias

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Wasserstein perspective of Vanilla GANs

Kunkel, Lea, Trabs, Mathias

arXiv.org Machine LearningMar-22-2024

The empirical success of Generative Adversarial Networks (GANs) caused an increasing interest in theoretical research. The statistical literature is mainly focused on Wasserstein GANs and generalizations thereof, which especially allow for good dimension reduction properties. Statistical results for Vanilla GANs, the original optimization problem, are still rather limited and require assumptions such as smooth activation functions and equal dimensions of the latent space and the ambient space. To bridge this gap, we draw a connection from Vanilla GANs to the Wasserstein distance. By doing so, existing results for Wasserstein GANs can be extended to Vanilla GANs. In particular, we obtain an oracle inequality for Vanilla GANs in Wasserstein distance. The assumptions of this oracle inequality are designed to be satisfied by network architectures commonly used in practice, such as feedforward ReLU networks. By providing a quantitative result for the approximation of a Lipschitz function by a feedforward ReLU network with bounded H\"older norm, we conclude a rate of convergence for Vanilla GANs as well as Wasserstein GANs as estimators of the unknown probability distribution.

artificial intelligence, lip, machine learning, (18 more...)

arXiv.org Machine Learning

2403.15312

Country: Europe (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

AdamMCMC: Combining Metropolis Adjusted Langevin with Momentum-based Optimization

Bieringer, Sebastian, Kasieczka, Gregor, Steffen, Maximilian F., Trabs, Mathias

arXiv.org Machine LearningDec-21-2023

Uncertainty estimation is a key issue when considering the application of deep neural network methods in science and engineering. In this work, we introduce a novel algorithm that quantifies epistemic uncertainty via Monte Carlo sampling from a tempered posterior distribution. It combines the well established Metropolis Adjusted Langevin Algorithm (MALA) with momentum-based optimization using Adam and leverages a prolate proposal distribution, to efficiently draw from the posterior. We prove that the constructed chain admits the Gibbs posterior as an invariant distribution and converges to this Gibbs posterior in total variation distance. Numerical evaluations are postponed to a first revision.

artificial intelligence, machine learning, proceedings, (12 more...)

arXiv.org Machine Learning

2312.14027

Country:

Europe (0.94)
North America > United States (0.68)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Dimensionality Reduction and Wasserstein Stability for Kernel Regression

Eckstein, Stephan, Iske, Armin, Trabs, Mathias

arXiv.org Machine LearningNov-27-2023

In a high-dimensional regression framework, we study consequences of the naive two-step procedure where first the dimension of the input variables is reduced and second, the reduced input variables are used to predict the output variable with kernel regression. In order to analyze the resulting regression errors, a novel stability result for kernel regression with respect to the Wasserstein distance is derived. This allows us to bound errors that occur when perturbed input data is used to fit the regression function. We apply the general stability result to principal component analysis (PCA). Exploiting known estimates from the literature on both principal component analysis and kernel regression, we deduce convergence rates for the two-step procedure. The latter turns out to be particularly useful in a semi-supervised setting.

artificial intelligence, machine learning, regression, (13 more...)

arXiv.org Machine Learning

2203.09347

Country: Europe (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.45)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.44)

Add feedback

Statistical guarantees for stochastic Metropolis-Hastings

Bieringer, Sebastian, Kasieczka, Gregor, Steffen, Maximilian F., Trabs, Mathias

arXiv.org Machine LearningOct-13-2023

A Metropolis-Hastings step is widely used for gradient-based Markov chain Monte Carlo methods in uncertainty quantification. By calculating acceptance probabilities on batches, a stochastic Metropolis-Hastings step saves computational costs, but reduces the effective sample size. We show that this obstacle can be avoided by a simple correction term. We study statistical properties of the resulting stationary distribution of the chain if the corrected stochastic Metropolis-Hastings approach is applied to sample from a Gibbs posterior distribution in a nonparametric regression setting. Focusing on deep neural network regression, we prove a PAC-Bayes oracle inequality which yields optimal contraction rates and we analyze the diameter and show high coverage probability of the resulting credible sets. With a numerical example in a high-dimensional parameter space, we illustrate that credible sets and contraction rates of the stochastic Metropolis-Hastings algorithm indeed behave similar to those obtained from the classical Metropolis-adjusted Langevin algorithm.

artificial intelligence, bayesian inference, machine learning, (20 more...)

arXiv.org Machine Learning

2310.09335

Country: Europe > Germany (0.28)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.92)

Add feedback

A PAC-Bayes oracle inequality for sparse neural networks

Steffen, Maximilian F., Trabs, Mathias

arXiv.org Machine LearningOct-2-2023

Driven by the enormous success of neural networks in a broad spectrum of machine learning applications, see Goodfellow et al. [16] and Schmidhuber [29] for an introduction, the theoretical understanding of network based methods is a dynamic and flourishing research area at the intersection of mathematical statistics, optimization and approximation theory. In addition to theoretical guarantees, uncertainty quantification is an important and challenging problem for neural networks and has motivated the introduction of Bayesian neural networks, where for each network weight a distribution is learned, see Graves [17] and Blundell et al. [8] and numerous subsequent articles. In this work we study the Gibbs posterior distribution for a stochastic neural network. In a nonparametric regression problem, we show that the corresponding estimator achieves a minimax-optimal prediction risk bound up to a logarithmic factor. Moreover, the method is adaptive with respect to the unknown regularity and structure of the regression function. While early theoretical foundations for neural nets are summarized by Anthony & Bartlett [4], the excellent approximation properties of deep neural nets, especially with the ReLU activation function, have been discovered in the last years, see e.g.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Machine Learning

2204.12392

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback