Goto

Collaborating Authors

 guidance


SURGE: Approximation and Training Free Particle Filter for Diffusion Surrogate

arXiv.org Machine Learning

Data assimilation (DA) addresses the problem of sequentially estimating the state of a dynamical system from noisy and incomplete observations. In this work, we employ a diffusion model as a world model to simulate and predict the system's dynamics. Recently, score-based diffusion models have learned global diffusion priors that effectively model (stochastic) dynamics, revealing strong potential for data assimilation. In this paper, we investigate how information from noisy observations can be incorporated to enable continuous correction and refinement of the predicted system state when using a diffusion prior. Motivated by particle filtering methods, we represent the posterior distribution using a set of particles. After receiving noisy observations, the diffusion model is guided using the observation likelihood to steer the generation process toward observation-consistent states. Nevertheless, such guidance does not guarantee sampling from the true posterior. We therefore employ a Sequential Monte Carlo approach over the diffusion trajectory, viewed as a path measure, to reweight and resample particles, thereby correcting the generation process and ensuring convergence toward the desired posterior distribution. This leads to an unbiased particle filtering method that rigorously fuses observational data with diffusion model simulations.


Simple Approximation and Derivative Free Inference-Time Scaling for Diffusion Models via Sequential Monte Carlo on Path Measures

arXiv.org Machine Learning

Modern generative models have emerged as a powerful Diffusion-based generative models increasingly paradigm for learning complex, high-dimensional data distributions. In particular, diffusion models (Ho et al., 2020; rely on inference-time guidance, adding a drift Sohl-Dickstein et al., 2015; Song and Ermon, 2019; Song term or reweighting mixture of experts, to imet al., 2020) and flow-based methods (Zhang et al., 2018a; prove sample quality on task-specific objectives. However, most existing techniques reLipman et al., 2022; Albergo and Vanden-Eijnden, 2022; Liu quire repeated score or gradient evaluations, inet al., 2022) provide a principled and scalable framework for generative modeling, achieving state-of-the-art performance troducing bias, high computational overhead, or across diverse applications, including video generation (Ho both. We introduce URGE, approximation-free et al., 2022), protein design (Gruver et al., 2023), and largeResampling via Girsanov Estimation, a derivativefree inference-time scaling algorithm that perscale text generation (Li et al., 2022; Nie et al., 2025). A forms pathwise importance reweighting via a Girunifying perspective underlying these approaches is their formulation in terms of stochastic differential equations sanov change of measure.


Improved Baselines with Representation Autoencoders

arXiv.org Machine Learning

Representation Autoencoders (RAE) replace traditional VAE with pretrained vision encoders. In this paper, we systematically investigate several design choices and find three insights which simplify and improve RAE. First, we study a generalized formulation where the representation is defined as sum of the last k encoder layers rather than solely the final layer. This simple change greatly improves reconstruction without encoder finetuning or specialized data (e.g., text, faces). Second, we study the prevalent assumption that RAE (using pretrained representation as encoder) replaces representation alignment (REPA), which distills the same representation to intermediate layers instead. Through large-scale empirical analysis, we uncover a surprising finding: RAE and REPA exhibit complementary working mechanisms, allowing the same representation to be used as both encoder and target for intermediate diffusion layers. Finally, the original RAE struggles with classifier-free guidance (CFG) and requires training a second, weaker diffusion model for AutoGuidance (AG). We show that REPA itself can be viewed as x-prediction in RAE latent space. By simply re-parameterizing the output of the DiT model, it can provide guidance for "free". Overall, RAEv2 leads to more than 10x faster convergence over the original RAE, achieving a state-of-the-art gFID of 1.06 in just 80 epochs on ImageNet-256. On FDr^k, RAEv2 achieves a state-of-the-art 2.17 at just 80 epochs compared to the previous best 3.26 (800 epochs) without any post-training. This motivates EP_FID@k (epochs to reach unguided gFID <= k) as a measure of training efficiency. RAEv2 attains an EP_FID@2 of 35 epochs, versus 177 for the original RAE. We also validate our approach across diverse settings for text-to-image generation and navigation world models, showing consistent improvements. Code is available at https://raev2.github.io.


New rules confirm public has a right to see how UK government uses AI

New Scientist

Government departments and other public bodies in the UK must consider requests to release information about AI-produced content, regulators have confirmed. The move follows a successful request by New Scientist for the release of a minister's ChatGPT logs The use of AI chatbots is subject to the UK's Freedom of Information laws Text, images and other content produced by UK government departments and other public bodies using artificial intelligence are subject to freedom of information (FOI) laws, regulators have confirmed - potentially opening the door for the public to gain access to ministers' ChatGPT or other chatbot records. The Information Commissioner's Office (ICO), the UK's data-protection agency, has released new guidance confirming that "If staff at a public authority use AI for work purposes, the information generated will be subject to FOIA [the Freedom of Information Act] along with the prompts used". Last year, successfully requested the then-UK tech secretary Peter Kyle's ChatGPT logs under FOI legislation, in what is believed to be a world first. That triggered subsequent requests from other news outlets to obtain other information, but many have either been rejected on cost grounds or labelled as "vexatious", an umbrella term that allows authorities to reject a request.


One-Step Generative Modeling via Wasserstein Gradient Flows

arXiv.org Machine Learning

Diffusion models and flow-based methods have shown impressive generative capability, especially for images, but their sampling is expensive because it requires many iterative updates. We introduce W-Flow, a framework for training a generator that transforms samples from a simple reference distribution into samples from a target data distribution in a single step. This is achieved in two steps: we first define an evolution from the reference distribution to the target distribution through a Wasserstein gradient flow that minimizes an energy functional; second, we train a static neural generator to compress this evolution into one-step generation. We instantiate the energy functional with the Sinkhorn divergence, which yields an efficient optimal-transport-based update rule that captures global distributional discrepancy and improves coverage of the target distribution. We further prove that the finite-sample training dynamics converge to the continuous-time distributional dynamics under suitable assumptions. Empirically, W-Flow sets a new state of the art for one-step ImageNet 256$\times$256 generation, achieving 1.29 FID, with improved mode coverage and domain transfer. Compared to multi-step diffusion models with similar FID scores, our method yields approximately 100$\times$ faster sampling. These results show that Wasserstein gradient flows provide a principled and effective foundation for fast and high-fidelity generative modeling.


Flow Matching for Count Data

arXiv.org Machine Learning

High-dimensional count data arise in applications such as single-cell RNA sequencing and neural spike trains, where mapping between distributions across successive batches or time points form critical components of data analysis. The recent success of diffusion- and flow-based deep generative models for images, video, and text motivates extending these ideas to count-valued settings, but many existing methods either treat each count as a categorical state or transform counts into a continuous space, neither of which is natural or efficient when the count range is large. We propose count-FM, a flow-matching framework for count data based on a continuous-time birth-death process with local unit jumps. Count-FM learns marginal transitions efficiently in count space through simulation-free training of conditional transition rates, allowing transport between arbitrary count-distributed source and target populations. In simulation, count-FM achieves better sample quality than representative baselines while using substantially fewer parameters. We further apply count-FM to scRNA-seq and neural spike-train data for unconditional generation, transport, and conditional generation. Across these tasks, count-FM yields improved sample quality, greater modeling efficiency, and interpretable transport paths.


Mean-Field Path-Integral Diffusion: From Samples to Interacting Agents

arXiv.org Machine Learning

Independent sample generation is the prevailing paradigm in modern diffusion-based generative models of AI. We ask a different question: can samples coordinate through shared population statistics to transport probability mass more efficiently? We introduce Mean-Field Path-Integral Diffusion (MF-PID), a framework in which samples are promoted to interacting agents whose drift depends self-consistently on the evolving population density. We identify two analytically tractable regimes: a Linear-Quadratic-Gaussian (LQG) benchmark in which the infinite-dimensional mean-field system reduces to a finite set of Riccati and linear ODEs, and a Gaussian-mixture regime governed by a piecewise-constant protocol that preserves closed-form solvability. For a quadratic interaction potential with schedule ฮฒt and zero base drift we prove that the self-consistent MF guidance is the exact linear interpolant between initial and target global means -- a result that holds for arbitrary initial and target densities and any ฮฒt. Applied to demand-response control of energy systems, where agents aggregated into an ensemble are energy consumers (e.g. The energy saving is independent of the number of zones per building (d = 1-32 tested), confirming that the linear guidance formula broadcasts a single d-vector with O(d) communication and grows mildly in compute (sub-cubically for d 32, asymptotically O(d3) for d 1). Introduction Generative AI has been transformed by diffusion models, which frame sample generation as a stochastic process steered from noise to data [1-3]. A key structural feature of these models -- shared with other generative models, e.g. Similarly, stochastic optimal transport (SOT) and Schrรถdinger bridge formulations [6-8] cast distribution matching as an independent-particle path optimization, yielding tractable convolutions of Green functions but discarding inter-particle information; stochastic interpolants [9] construct flexible transport bridges between arbitrary densities via tunable continuous-time stochastic processes, recovering the Schrรถdinger bridge as a special limit -- again in an independent-particle framework.


NHS England rushes to hide software over AI hacking fears

New Scientist

NHS England is hurriedly withdrawing all the software it has written from public view because of the perceived risk of hacking from cutting-edge artificial intelligence. Security experts say the move is unnecessary and counterproductive. Software produced by the National Health Service has previously been made open-source and listed on GitHub because it is created with public money. This allows other organisations to build upon it and make better services more cheaply without duplicating effort. But NHS England has issued new guidance to staff, which has been shared with, that demands existing and future software be pulled from public view and kept behind closed doors.


Robust Visual Reasoning via Language Guided Neural Module Networks

Neural Information Processing Systems

Neural module networks (NMN) are a popular approach for solving multi-modal tasks such as visual question answering (VQA) and visual referring expression recognition (REF). A key limitation in prior implementations of NMN is that the neural modules do not effectively capture the association between the visual input and the relevant neighbourhood context of the textual input.