Goto

Collaborating Authors

 sinkhorn distance


Rethinking Losses for Diffusion Bridge Samplers

Neural Information Processing Systems

Diffusion bridges are a promising class of deep-learning methods for sampling from unnormalized distributions. Recent works show that the Log Variance (LV) loss consistently outperforms the reverse Kullback-Leibler (rKL) loss when using the reparametrization trick to compute rKL-gradients. While the on-policy LV loss yields identical gradients to the rKL loss when combined with the log-derivative trick for diffusion samplers with non-learnable forward processes, this equivalence does not hold for diffusion bridges or when diffusion coefficients are learned. Based on this insight we argue that for diffusion bridges the LV loss does not represent an optimization objective that can be motivated like the rKL loss via the data processing inequality. Our analysis shows that employing the rKL loss with the log-derivative trick (rKL-LD) does not only avoid these conceptual problems but also consistently outperforms the LV loss. Experimental results with different types of diffusion bridges on challenging benchmarks show that samplers trained with the rKL-LD loss achieve better performance. From a practical perspective we find that rKL-LD requires significantly less hyperparameter optimization and yields more stable training behavior.1


Non-equilibrium Annealed Adjoint Sampler

Neural Information Processing Systems

Recently, there has been significant progress in learning-based diffusion samplers, which aim to sample from a given unnormalized density. Many of these approaches formulate the sampling task as a stochastic optimal control (SOC) problem using a canonical uninformative reference process, which limits their ability to efficiently guide trajectories toward the target distribution. In this work, we propose the NonEquilibrium Annealed Adjoint Sampler (NAAS), a novel SOC-based diffusion framework that employs annealed reference dynamics as a non-stationary base SDE. This annealing structure provides a natural progression toward the target distribution and generates informative reference trajectories, thereby enhancing the stability and efficiency of learning the control. Owing to our SOC formulation, our framework can incorporate a variety of SOC solvers, thereby offering high flexibility in algorithmic design. As one instantiation, we employ a lean adjoint system inspired by adjoint matching, enabling efficient and scalable training. We demonstrate the effectiveness of NAAS across a range of tasks, including sampling from classical energy landscapes and molecular Boltzmann distributions.


Modality-Agnostic Topology Aware Localization - Supplemental Material - Farhad G. Zanjani Ilia Karmanov Hanno Ackermann Daniel Dijkman Simone Merlin Max Welling Fatih Porikli Qualcomm AIResearch

Neural Information Processing Systems

Triplet sampling was implemented based on the temporal vicinity of samples. Since the input is sequential, for each sample (called anchor) in the sequence, we consider a small and a large temporal window with predefined fixed widths. These two temporal windows are centered at the timestamp of the anchor. Any sample inside the smaller temporal window can be considered as a positive sample and any sample outside the small window but inside the large window can be considered as a negative sample. The widths of the temporal windows roughly depend on the speed of the observer in the environment.






COT-GAN: GeneratingSequentialData viaCausalOptimalTransport

Neural Information Processing Systems

Remarkably, we find that this causality condition provides a natural framework to parameterize the cost function that is learned by the discriminator as arobust (worst-case) distance, and anideal mechanism for learning time dependent data distributions.



Massively scalable Sinkhorn distances via the Nyström method

Neural Information Processing Systems

The Sinkhorn distance, a variant of the Wasserstein distance with entropic regularization, is an increasingly popular tool in machine learning and statistical inference. However, the time and memory requirements of standard algorithms for computing this distance grow quadratically with the size of the data, rendering them prohibitively expensive on massive data sets. In this work, we show that this challenge is surprisingly easy to circumvent: combining two simple techniques--the Nyström method and Sinkhorn scaling--provably yields an accurate approximation of the Sinkhorn distance with significantly lower time and memory requirements than other approaches. We prove our results via new, explicit analyses of the Nyström method and of the stability properties of Sinkhorn scaling. We validate our claims experimentally by showing that our approach easily computes Sinkhorn distances on data sets hundreds of times larger than can be handled by other techniques.