Plotting

 Schuster, Ingmar


Autoregressive Denoising Diffusion Models for Multivariate Probabilistic Time Series Forecasting

arXiv.org Artificial Intelligence

In this work, we propose \texttt{TimeGrad}, an autoregressive model for multivariate probabilistic time series forecasting which samples from the data distribution at each time step by estimating its gradient. To this end, we use diffusion probabilistic models, a class of latent variable models closely connected to score matching and energy-based methods. Our model learns gradients by optimizing a variational bound on the data likelihood and at inference time converts white noise into a sample of the distribution of interest through a Markov chain using Langevin sampling. We demonstrate experimentally that the proposed autoregressive denoising diffusion model is the new state-of-the-art multivariate probabilistic forecasting method on real-world data sets with thousands of correlated dimensions. We hope that this method is a useful tool for practitioners and lays the foundation for future research in this area.


Feature space approximation for kernel-based supervised learning

arXiv.org Machine Learning

We propose a method for the approximation of high- or even infinite-dimensional feature vectors, which play an important role in supervised learning. The goal is to reduce the size of the training data, resulting in lower storage consumption and computational complexity. Furthermore, the method can be regarded as a regularization technique, which improves the generalizability of learned target functions. We demonstrate significant improvements in comparison to the computation of data-driven predictions involving the full training data set. The method is applied to classification and regression problems from different application areas such as image recognition, system identification, and oceanographic time series analysis.


Set Flow: A Permutation Invariant Normalizing Flow

arXiv.org Machine Learning

We present a generative model that is defined on finite sets of exchangeable, potentially high dimensional, data. As the architecture is an extension of RealNVPs, it inherits all its favorable properties, such as being invertible and allowing for exact log-likelihood evaluation. We show that this architecture is able to learn finite non-i.i.d. set data distributions, learn statistical dependencies between entities of the set and is able to train and sample with variable set sizes in a computationally efficient manner. Experiments on 3D point clouds show state-of-the art likelihoods.


Kernel Conditional Density Operators

arXiv.org Machine Learning

We introduce a conditional density estimation model termed the conditional density operator. It naturally captures multivariate, multimodal output densities and is competitive with recent neural conditional density models and Gaussian processes. To derive the model, we propose a novel approach to the reconstruction of probability densities from their kernel mean embeddings by drawing connections to estimation of Radon-Nikodym derivatives in the reproducing kernel Hilbert space (RKHS). We prove finite sample error bounds which are independent of problem dimensionality. Furthermore, the resulting conditional density model is applied to real-world data and we demonstrate its versatility and competitive performance.


A kernel-based approach to molecular conformation analysis

arXiv.org Machine Learning

We present a novel machine learning approach to understanding conformation dynamics of biomolecules. The approach combines kernel-based techniques that are popular in the machine learning community with transfer operator theory for analyzing dynamical systems in order to identify conformation dynamics based on molecular dynamics simulation data. We show that many of the prominent methods like Markov State Models, EDMD, and TICA can be regarded as special cases of this approach and that new efficient algorithms can be constructed based on this derivation. The results of these new powerful methods will be illustrated with several examples, in particular the alanine dipeptide and the protein NTL9. I. INTRODUCTION The spectral analysis of transfer operators such as the Perron-Frobenius or Koopman operator is by now a well-established technique in molecular conformation analysis These slow transitions are critical for a better understanding of the functioning of peptides and proteins. Since these operators are infinite-dimensional, they are typically projected onto a space spanned by a set of predefined basis functions. The advantage of the former is that the size of the resulting eigenvalue problem depends only on the size of the feature space, but not on the size of the training data set (this corresponds to EDMD or VAC). However, this approach can in general not be applied to the typically high-dimensional systems prevalent in molecular dynamics due to the curse of dimensionality and furthermore requires an explicit feature space representation, i.e., an explicit basis of the approximation space. For the kernel-based variant, the size of the eigenvalue problem is independent of the number of basis functions--and thus allows for implicitly infinitedimensional feature spaces--, but depends on the size of the training data set (this corresponds to kernel EDMD or kernel TICA). Kernel-based methods thus promise increased performance and accuracy in transfer operatorbased conformation analysis.


Markov Chain Importance Sampling - a highly efficient estimator for MCMC

arXiv.org Machine Learning

Markov chain algorithms are ubiquitous in machine learning and statistics and many other disciplines. In this work we present a novel estimator applicable to several classes of Markov chains, dubbed Markov chain importance sampling (MCIS). For a broad class of Metropolis-Hastings algorithms, MCIS efficiently makes use of rejected proposals. For discretized Langevin diffusions, it provides a novel way of correcting the discretization error. Our estimator satisfies a central limit theorem and improves on error per CPU cycle, often to a large extent. As a by-product it enables estimating the normalizing constant, an important quantity in Bayesian machine learning and statistics.


Analyzing high-dimensional time-series data using kernel transfer operator eigenfunctions

arXiv.org Machine Learning

Kernel transfer operators, which can be regarded as approximations of transfer operators such as the Perron-Frobenius or Koopman operator in reproducing kernel Hilbert spaces, are defined in terms of covariance and cross-covariance operators and have been shown to be closely related to the conditional mean embedding framework developed by the machine learning community. The goal of this paper is to show how the dominant eigenfunctions of these operators in combination with gradient-based optimization techniques can be used to detect long-lived coherent patterns in high-dimensional time-series data. The results will be illustrated using video data and a fluid flow example.


Eigendecompositions of Transfer Operators in Reproducing Kernel Hilbert Spaces

arXiv.org Machine Learning

Transfer operators such as the Perron-Frobenius or Koopman operator play an important role in the global analysis of complex dynamical systems. The eigenfunctions of these operators can be used to detect metastable sets, to project the dynamics onto the dominant slow processes, or to separate superimposed signals. We extend transfer operator theory to reproducing kernel Hilbert spaces and show that these operators are related to Hilbert space representations of conditional distributions, known as conditional mean embeddings in the machine learning community. Moreover, numerical methods to compute empirical estimates of these embeddings are akin to data-driven methods for the approximation of transfer operators such as extended dynamic mode decomposition and its variants. In fact, most of the existing methods can be derived from our framework, providing a unifying view on the approximation of transfer operators. One main benefit of the presented kernel-based approaches is that these methods can be applied to any domain where a similarity measure given by a kernel is available. We illustrate the results with the aid of guiding examples and highlight potential applications in molecular dynamics as well as video and text data analysis.


Kernel Sequential Monte Carlo

arXiv.org Machine Learning

We propose kernel sequential Monte Carlo (KSMC), a framework for sampling from static target densities. KSMC is a family of sequential Monte Carlo algorithms that are based on building emulator models of the current particle system in a reproducing kernel Hilbert space. We here focus on modelling nonlinear covariance structure and gradients of the target. The emulator's geometry is adaptively updated and subsequently used to inform local proposals. Unlike in adaptive Markov chain Monte Carlo, continuous adaptation does not compromise convergence of the sampler. KSMC combines the strengths of sequental Monte Carlo and kernel methods: superior performance for multimodal targets and the ability to estimate model evidence as compared to Markov chain Monte Carlo, and the emulator's ability to represent targets that exhibit high degrees of nonlinearity. As KSMC does not require access to target gradients, it is particularly applicable on targets whose gradients are unknown or prohibitively expensive. We describe necessary tuning details and demonstrate the benefits of the the proposed methodology on a series of challenging synthetic and real-world examples.


Gradient Importance Sampling

arXiv.org Machine Learning

Adaptive Monte Carlo schemes developed over the last years usually seek to ensure ergodicity of the sampling process in line with MCMC tradition. This poses constraints on what is possible in terms of adaptation. In the general case ergodicity can only be guaranteed if adaptation is diminished at a certain rate. Importance Sampling approaches offer a way to circumvent this limitation and design sampling algorithms that keep adapting. Here I present a gradient informed variant of SMC (and its special case Population Monte Carlo) for static problems.