Goto

Collaborating Authors

 Trabs, Mathias


A Wasserstein perspective of Vanilla GANs

arXiv.org Machine Learning

The empirical success of Generative Adversarial Networks (GANs) caused an increasing interest in theoretical research. The statistical literature is mainly focused on Wasserstein GANs and generalizations thereof, which especially allow for good dimension reduction properties. Statistical results for Vanilla GANs, the original optimization problem, are still rather limited and require assumptions such as smooth activation functions and equal dimensions of the latent space and the ambient space. To bridge this gap, we draw a connection from Vanilla GANs to the Wasserstein distance. By doing so, existing results for Wasserstein GANs can be extended to Vanilla GANs. In particular, we obtain an oracle inequality for Vanilla GANs in Wasserstein distance. The assumptions of this oracle inequality are designed to be satisfied by network architectures commonly used in practice, such as feedforward ReLU networks. By providing a quantitative result for the approximation of a Lipschitz function by a feedforward ReLU network with bounded H\"older norm, we conclude a rate of convergence for Vanilla GANs as well as Wasserstein GANs as estimators of the unknown probability distribution.


AdamMCMC: Combining Metropolis Adjusted Langevin with Momentum-based Optimization

arXiv.org Machine Learning

Uncertainty estimation is a key issue when considering the application of deep neural network methods in science and engineering. In this work, we introduce a novel algorithm that quantifies epistemic uncertainty via Monte Carlo sampling from a tempered posterior distribution. It combines the well established Metropolis Adjusted Langevin Algorithm (MALA) with momentum-based optimization using Adam and leverages a prolate proposal distribution, to efficiently draw from the posterior. We prove that the constructed chain admits the Gibbs posterior as an invariant distribution and converges to this Gibbs posterior in total variation distance. Numerical evaluations are postponed to a first revision.


Dimensionality Reduction and Wasserstein Stability for Kernel Regression

arXiv.org Machine Learning

In a high-dimensional regression framework, we study consequences of the naive two-step procedure where first the dimension of the input variables is reduced and second, the reduced input variables are used to predict the output variable with kernel regression. In order to analyze the resulting regression errors, a novel stability result for kernel regression with respect to the Wasserstein distance is derived. This allows us to bound errors that occur when perturbed input data is used to fit the regression function. We apply the general stability result to principal component analysis (PCA). Exploiting known estimates from the literature on both principal component analysis and kernel regression, we deduce convergence rates for the two-step procedure. The latter turns out to be particularly useful in a semi-supervised setting.


Statistical guarantees for stochastic Metropolis-Hastings

arXiv.org Machine Learning

A Metropolis-Hastings step is widely used for gradient-based Markov chain Monte Carlo methods in uncertainty quantification. By calculating acceptance probabilities on batches, a stochastic Metropolis-Hastings step saves computational costs, but reduces the effective sample size. We show that this obstacle can be avoided by a simple correction term. We study statistical properties of the resulting stationary distribution of the chain if the corrected stochastic Metropolis-Hastings approach is applied to sample from a Gibbs posterior distribution in a nonparametric regression setting. Focusing on deep neural network regression, we prove a PAC-Bayes oracle inequality which yields optimal contraction rates and we analyze the diameter and show high coverage probability of the resulting credible sets. With a numerical example in a high-dimensional parameter space, we illustrate that credible sets and contraction rates of the stochastic Metropolis-Hastings algorithm indeed behave similar to those obtained from the classical Metropolis-adjusted Langevin algorithm.


A PAC-Bayes oracle inequality for sparse neural networks

arXiv.org Machine Learning

Driven by the enormous success of neural networks in a broad spectrum of machine learning applications, see Goodfellow et al. [16] and Schmidhuber [29] for an introduction, the theoretical understanding of network based methods is a dynamic and flourishing research area at the intersection of mathematical statistics, optimization and approximation theory. In addition to theoretical guarantees, uncertainty quantification is an important and challenging problem for neural networks and has motivated the introduction of Bayesian neural networks, where for each network weight a distribution is learned, see Graves [17] and Blundell et al. [8] and numerous subsequent articles. In this work we study the Gibbs posterior distribution for a stochastic neural network. In a nonparametric regression problem, we show that the corresponding estimator achieves a minimax-optimal prediction risk bound up to a logarithmic factor. Moreover, the method is adaptive with respect to the unknown regularity and structure of the regression function. While early theoretical foundations for neural nets are summarized by Anthony & Bartlett [4], the excellent approximation properties of deep neural nets, especially with the ReLU activation function, have been discovered in the last years, see e.g.