AITopics

2502.06685

Country:

North America > United States (0.28)
Europe > United Kingdom > England (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Government (0.46)
Food & Agriculture (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

arXiv.org Artificial IntelligenceFeb-6-2025

Iterative Importance Fine-tuning of Diffusion Models

Denker, Alexander, Padhy, Shreyas, Vargas, Francisco, Hertrich, Johannes

Diffusion models are an important tool for generative modelling, serving as effective priors in applications such as imaging and protein design. A key challenge in applying diffusion models for downstream tasks is efficiently sampling from resulting posterior distributions, which can be addressed using the h-transform. This work introduces a self-supervised algorithm for fine-tuning diffusion models by estimating the h-transform, enabling amortised conditional sampling. We demonstrate the effectiveness of this framework on class-conditional sampling and reward fine-tuning for text-to-image diffusion models. Diffusion models have emerged as a powerful tool for generative modelling (Ho et al., 2020; Dhariwal & Nichol, 2021). As training these models is expensive and requires large amount of data, fine-tuning existing models for new tasks is of interest.

artificial intelligence, diffusion model, machine learning, (15 more...)

2502.04468

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

arXiv.org Artificial IntelligenceJun-20-2024

A Generative Model of Symmetry Transformations

Allingham, James Urquhart, Mlodozeniec, Bruno Kacper, Padhy, Shreyas, Antorán, Javier, Krueger, David, Turner, Richard E., Nalisnick, Eric, Hernández-Lobato, José Miguel

Correctly capturing the symmetry transformations of data can lead to efficient models with strong generalization capabilities, though methods incorporating symmetries often require prior knowledge. While recent advancements have been made in learning those symmetries directly from the dataset, most of this work has focused on the discriminative setting. In this paper, we take inspiration from group theoretic ideas to construct a generative model that explicitly aims to capture the data's approximate symmetries. This results in a model that, given a prespecified broad set of possible symmetries, learns to what extent, if at all, those symmetries are actually present. Our model can be seen as a generative process for data augmentation. We provide a simple algorithm for learning our generative model and empirically demonstrate its ability to capture symmetries under affine and color transformations, in an interpretable way. Combining our symmetry model with standard generative models results in higher marginal test-log-likelihoods and improved data efficiency.

artificial intelligence, machine learning, natural language, (18 more...)

2403.01946

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > New York (0.14)
North America > United States > Hawaii (0.14)
(3 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

arXiv.org Machine LearningJun-6-2024

Improving Linear System Solvers for Hyperparameter Optimisation in Iterative Gaussian Processes

Lin, Jihao Andreas, Padhy, Shreyas, Mlodozeniec, Bruno, Antorán, Javier, Hernández-Lobato, José Miguel

Scaling hyperparameter optimisation to very large datasets remains an open problem in the Gaussian process community. This paper focuses on iterative methods, which use linear system solvers, like conjugate gradients, alternating projections or stochastic gradient descent, to construct an estimate of the marginal likelihood gradient. We discuss three key improvements which are applicable across solvers: (i) a pathwise gradient estimator, which reduces the required number of solver iterations and amortises the computational cost of making predictions, (ii) warm starting linear system solvers with the solution from the previous step, which leads to faster solver convergence at the cost of negligible bias, (iii) early stopping linear system solvers after a limited computational budget, which synergises with warm starting, allowing solver progress to accumulate over multiple marginal likelihood steps. These techniques provide speed-ups of up to $72\times$ when solving to tolerance, and decrease the average residual norm by up to $7\times$ when stopping early.

artificial intelligence, estimator, machine learning, (14 more...)

2405.18457

Genre: Research Report (0.81)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

arXiv.org Artificial IntelligenceJun-3-2024

DEFT: Efficient Finetuning of Conditional Diffusion Models by Learning the Generalised $h$-transform

Denker, Alexander, Vargas, Francisco, Padhy, Shreyas, Didi, Kieran, Mathis, Simon, Dutordoir, Vincent, Barbano, Riccardo, Mathieu, Emile, Komorowska, Urszula Julia, Lio, Pietro

Generative modelling paradigms based on denoising diffusion processes have emerged as a leading candidate for conditional sampling in inverse problems. In many real-world applications, we often have access to large, expensively trained unconditional diffusion models, which we aim to exploit for improving conditional sampling. Most recent approaches are motivated heuristically and lack a unifying framework, obscuring connections between them. Further, they often suffer from issues such as being very sensitive to hyperparameters, being expensive to train or needing access to weights hidden behind a closed API. In this work, we unify conditional training and sampling using the mathematically well-understood Doob's h-transform. This new perspective allows us to unify many existing methods under a common umbrella. Under this framework, we propose DEFT (Doob's h-transform Efficient FineTuning), a new approach for conditional generation that simply fine-tunes a very small network to quickly learn the conditional $h$-transform, while keeping the larger unconditional network unchanged. DEFT is much faster than existing baselines while achieving state-of-the-art performance across a variety of linear and non-linear benchmarks. On image reconstruction tasks, we achieve speedups of up to 1.6$\times$, while having the best perceptual quality on natural images and reconstruction performance on medical images.

artificial intelligence, diffusion model, machine learning, (17 more...)

2406.01781

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Maryland (0.14)

Genre: Research Report (0.64)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.48)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)

arXiv.org Machine LearningMay-28-2024

Warm Start Marginal Likelihood Optimisation for Iterative Gaussian Processes

Lin, Jihao Andreas, Padhy, Shreyas, Mlodozeniec, Bruno, Hernández-Lobato, José Miguel

Gaussian processes are a versatile probabilistic machine learning model whose effectiveness often depends on good hyperparameters, which are typically learned by maximising the marginal likelihood. In this work, we consider iterative methods, which use iterative linear system solvers to approximate marginal likelihood gradients up to a specified numerical precision, allowing a trade-off between compute time and accuracy of a solution. We introduce a three-level hierarchy of marginal likelihood optimisation for iterative Gaussian processes, and identify that the computational costs are dominated by solving sequential batches of large positive-definite systems of linear equations. We then propose to amortise computations by reusing solutions of linear system solvers as initialisations in the next step, providing a $\textit{warm start}$. Finally, we discuss the necessary conditions and quantify the consequences of warm starts and demonstrate their effectiveness on regression tasks, where warm starts achieve the same results as the conventional procedure while providing up to a $16 \times$ average speed-up among datasets.

artificial intelligence, machine learning, modeling & simulation, (9 more...)

2405.18328

Genre: Research Report (0.64)

Technology:

Information Technology > Modeling & Simulation (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)

arXiv.org Machine LearningNov-7-2023

Transport meets Variational Inference: Controlled Monte Carlo Diffusions

Vargas, Francisco, Padhy, Shreyas, Blessing, Denis, Nüsken, Nikolas

Connecting optimal transport and variational inference, we present a principled and systematic framework for sampling and generative modelling centred around divergences on path space. Our work culminates in the development of the \emph{Controlled Monte Carlo Diffusion} sampler (CMCD) for Bayesian computation, a score-based annealing technique that crucially adapts both forward and backward dynamics in a diffusion model. On the way, we clarify the relationship between the EM-algorithm and iterative proportional fitting (IPF) for Schr{\"o}dinger bridges, deriving as well a regularised objective that bypasses the iterative bottleneck of standard IPF-updates. Finally, we show that CMCD has a strong foundation in the Jarzinsky and Crooks identities from statistical physics, and that it convincingly outperforms competing approaches across a wide array of experiments.

artificial intelligence, machine learning, proposition 2, (16 more...)

2307.0105

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)

Genre: Research Report (0.63)

arXiv.org Machine LearningOct-31-2023

Stochastic Gradient Descent for Gaussian Processes Done Right

Lin, Jihao Andreas, Padhy, Shreyas, Antorán, Javier, Tripp, Austin, Terenin, Alexander, Szepesvári, Csaba, Hernández-Lobato, José Miguel, Janz, David

We study the optimisation problem associated with Gaussian process regression using squared loss. The most common approach to this problem is to apply an exact solver, such as conjugate gradient descent, either directly, or to a reducedorder version of the problem. Recently, driven by successes in deep learning, stochastic gradient descent has gained traction as an alternative. In this paper, we show that when done right--by which we mean using specific insights from the optimisation and kernel communities--this approach is highly effective. We thus introduce a particular stochastic dual gradient descent algorithm, that may be implemented with a few lines of code using any deep learning framework. We explain our design decisions by illustrating their advantage against alternatives with ablation studies and show that the new method is highly competitive. Our evaluations on standard regression benchmarks and a Bayesian optimisation task set our approach apart from preconditioned conjugate gradients, variational Gaussian process approximations, and a previous version of stochastic gradient descent for Gaussian processes. Gaussian processes are a probabilistic framework for learning unknown functions. They are the de facto standard model of choice in areas like Bayesian optimisation, where uncertainty-aware decision making is required to gather data in an efficient manner.

artificial intelligence, descent, machine learning, (15 more...)

2310.20581

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > Canada > Alberta (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

arXiv.org Machine LearningOct-29-2023

Sampling from Gaussian Process Posteriors using Stochastic Gradient Descent

Lin, Jihao Andreas, Antorán, Javier, Padhy, Shreyas, Janz, David, Hernández-Lobato, José Miguel, Terenin, Alexander

Gaussian processes are a powerful framework for quantifying uncertainty and for sequential decision-making but are limited by the requirement of solving linear systems. In general, this has a cubic cost in dataset size and is sensitive to conditioning. We explore stochastic gradient algorithms as a computationally efficient method of approximately solving these linear systems: we develop low-variance optimization objectives for sampling from the posterior and extend these to inducing points. Counterintuitively, stochastic gradient descent often produces accurate predictions, even in cases where it does not converge quickly to the optimum. We explain this through a spectral characterization of the implicit bias from non-convergence. We show that stochastic gradient descent produces predictive distributions close to the true posterior both in regions with sufficient data coverage, and in regions sufficiently far away from the data. Experimentally, stochastic gradient descent achieves state-of-the-art performance on sufficiently large-scale or ill-conditioned regression tasks. Its uncertainty estimates match the performance of significantly more expensive baselines on a large-scale Bayesian optimization task.

artificial intelligence, gaussian process, machine learning, (15 more...)

2306.11589

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States (0.14)
North America > Canada > Alberta (0.14)
Europe > Spain (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

arXiv.org Artificial IntelligenceMar-16-2023

Sampling-based inference for large linear models, with application to linearised Laplace

Antorán, Javier, Padhy, Shreyas, Barbano, Riccardo, Nalisnick, Eric, Janz, David, Hernández-Lobato, José Miguel

Large-scale linear models are ubiquitous throughout machine learning, with contemporary application as surrogate models for neural network uncertainty quantification; that is, the linearised Laplace method. Alas, the computational cost associated with Bayesian linear models constrains this method's application to small networks, small output spaces and small datasets. We address this limitation by introducing a scalable sample-based Bayesian inference method for conjugate Gaussian multi-output linear models, together with a matching method for hyperparameter (regularisation) selection. Furthermore, we use a classic feature normalisation method (the g-prior) to resolve a previously highlighted pathology of the linearised Laplace method. Together, these contributions allow us to perform linearised neural network inference with ResNet-18 on CIFAR100 (11M parameters, 100 outputs x 50k datapoints), with ResNet-50 on Imagenet (50M parameters, 1000 outputs x 1.2M datapoints) and with a U-Net on a high-resolution tomographic reconstruction task (2M parameters, 251k output~dimensions).

artificial intelligence, machine learning, sampling-based inference, (3 more...)

2210.04994

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.44)