Goto

Collaborating Authors

 markov kernel



A Non-asymptotic Analysis for Learning and Applying a Preconditioner in MCMC

Hird, Max, Maire, Florian, Negrea, Jeffrey

arXiv.org Machine Learning

Preconditioning is a common method applied to modify Markov chain Monte Carlo algorithms with the goal of making them more efficient. In practice it is often extremely effective, even when the preconditioner is learned from the chain. We analyse and compare the finite-time computational costs of schemes which learn a preconditioner based on the target covariance or the expected Hessian of the target potential with that of a corresponding scheme that does not use preconditioning. We apply our results to the Unadjusted Langevin Algorithm (ULA) for an appropriately regular target, establishing non-asymptotic guarantees for preconditioned ULA which learns its preconditioner. Our results are also applied to the unadjusted underdamped Langevin algorithm in the supplementary material. To do so, we establish non-asymptotic guarantees on the time taken to collect $N$ approximately independent samples from the target for schemes that learn their preconditioners under the assumption that the underlying Markov chain satisfies a contraction condition in the Wasserstein-2 distance. This approximate independence condition, that we formalize, allows us to bridge the non-asymptotic bounds of modern MCMC theory and classical heuristics of effective sample size and mixing time, and is needed to amortise the costs of learning a preconditioner across the many samples it will be used to produce.





Invariant Representations via Wasserstein Correlation Maximization

Eikenberry, Keenan, Liu, Lizuo, Lee, Yoonsang

arXiv.org Machine Learning

This work investigates the use of Wasserstein correlation -- a normalized measure of statistical dependence based on the Wasserstein distance between a joint distribution and the product of its marginals -- for unsupervised representation learning. Unlike, for example, contrastive methods, which naturally cluster classes in the latent space, we find that an (auto)encoder trained to maximize Wasserstein correlation between the input and encoded distributions instead acts as a compressor, reducing dimensionality while approximately preserving the topological and geometric properties of the input distribution. More strikingly, we show that Wasserstein correlation maximization can be used to arrive at an (auto)encoder -- either trained from scratch, or else one that extends a frozen, pretrained model -- that is approximately invariant to a chosen augmentation, or collection of augmentations, and that still approximately preserves the structural properties of the non-augmented input distribution. To do this, we first define the notion of an augmented encoder using the machinery of Markov-Wasserstein kernels. When the maximization objective is then applied to the augmented encoder, as opposed to the underlying, deterministic encoder, the resulting model exhibits the desired invariance properties. Finally, besides our experimental results, which show that even simple feedforward networks can be imbued with invariants or can, alternatively, be used to impart invariants to pretrained models under this training process, we additionally establish various theoretical results for optimal transport-based dependence measures. Code is available at https://github.com/keenan-eikenberry/wasserstein_correlation_maximization .


Utilising Gradient-Based Proposals Within Sequential Monte Carlo Samplers for Training of Partial Bayesian Neural Networks

Millard, Andrew, Murphy, Joshua, Maskell, Simon, Zhao, Zheng

arXiv.org Machine Learning

Previous research has shown the benefit Bayesian methods can bring to certain problems within deep learning Gal et al. (2017). However, computing the exact posterior distributions of BNNs is a difficult task as traditional methods such as Markov chain Monte Carlo (MCMC) Hastings (1970) are computationally poorly suited to exploring high dimensional spaces and dealing with large amounts of data. Parametric methods such as variational inference are better suited to these difficulties, but only give an approximation to the posterior distribution. These spaces have been found to be highly complex Izmailov et al. (2021a) and therefore variational methods often give a poor approximation of the posterior. Sequential Monte Carlo (SMC) samplers Doucet et al. (2001) are an alternative to MCMC methods which also provide an empirical estimate of the posterior distribution. SMC samplers are instantly parallelisable Varsi et al. (2021b) and therefore can take advantage of the GPU resources commonly used in machine learning to speed up the training process. MCMC methods often require a warm-up period to adapt the hyperparameters, after which the chains can be parallelised. However, the hyperparameters must remain fixed after this warm-up period to obey stationarity. This means that SMC samplers can be more flexible than 1 arXiv:2505.03797v1


Set and functional prediction: randomness, exchangeability, and conformal

Vovk, Vladimir

arXiv.org Artificial Intelligence

Conformal prediction is usually presented as a method of set prediction [10, Part I], i.e., as a way of producing prediction sets (rather than pointpredictions). Another way to look at a conformal predictor is as a way of producin g a p-value function (discussed, in a slightly different context, in, e.g., [4]), which is a function mapping each possible label y of a test object to the corresponding conformal p-value. In analogy with "prediction sets", we will call su ch p-value functions "prediction functions".


Flow Matching: Markov Kernels, Stochastic Processes and Transport Plans

Wald, Christian, Steidl, Gabriele

arXiv.org Artificial Intelligence

Among generative neural models, flow matching techniques stand out for their simple applicability and good scaling properties. Here, velocity fields of curves connecting a simple latent and a target distribution are learned. Then the corresponding ordinary differential equation can be used to sample from a target distribution, starting in samples from the latent one. This paper reviews from a mathematical point of view different techniques to learn the velocity fields of absolutely continuous curves in the Wasserstein geometry. We show how the velocity fields can be characterized and learned via i) transport plans (couplings) between latent and target distributions, ii) Markov kernels and iii) stochastic processes, where the latter two include the coupling approach, but are in general broader. Besides this main goal, we show how flow matching can be used for solving Bayesian inverse problems, where the definition of conditional Wasserstein distances plays a central role. Finally, we briefly address continuous normalizing flows and score matching techniques, which approach the learning of velocity fields of curves from other directions.


Neural Network Symmetrisation in Concrete Settings

Cornish, Rob

arXiv.org Artificial Intelligence

Cornish (2024) recently gave a general theory of neural network symmetrisation in the abstract context of Markov categories. We give a high-level overview of these results, and their concrete implications for the symmetrisation of deterministic functions and of Markov kernels.