coupling
Schrödinger Bridge Matching for Tree-Structured Costs and Entropic Wasserstein Barycentres
Recent advances in flow-based generative modelling have provided scalable methods for computing the Schr odinger Bridge (SB) between distributions, a dynamic form of entropy-regularised Optimal Transport (OT) for the quadratic cost. The successful Iterative Markovian Fitting (IMF) procedure solves the SB problem via sequential bridge-matching steps, presenting an elegant and practical approach with many favourable properties over the more traditional Iterative Proportional Fitting (IPF) procedure. Beyond the standard setting, optimal transport can be generalised to the multi-marginal case in which the objective is to minimise a cost defined over several marginal distributions. Of particular importance are costs defined over a tree structure, from which Wasserstein barycentres can be recovered as a special case. In this work, we extend the IMF procedure to solve for the tree-structured SB problem. Our resulting algorithm inherits the many advantages of IMF over IPF approaches in the tree-based setting. In the case of Wasserstein barycentres, our approach can be viewed as extending the widely used fixed-point approach to use flow-based entropic OT solvers, while requiring only simple bridge-matching steps at each iteration.
Pairwise Optimal Transports for Training All-to-All Flow-Based Condition Transfer Model
In this paper, we propose a flow-based method for learning all-to-all transfer maps among conditional distributions that approximates pairwise optimal transport. The proposed method addresses the challenge of handling the case of continuous conditions, which often involve a large set of conditions with sparse empirical observations per condition. We introduce a novel cost function that enables simultaneous learning of optimal transports for all pairs of conditional distributions. Our method is supported by a theoretical guarantee that, in the limit, it converges to the pairwise optimal transports among infinite pairs of conditional distributions. The learned transport maps are subsequently used to couple data points in conditional flow matching. We demonstrate the effectiveness of this method on synthetic and benchmark datasets, as well as on chemical datasets in which continuous physical properties are defined as conditions.
HeavyWaterand SimplexWater: Distortion-free LLM Watermarks for Low-Entropy Distributions
Large language model (LLM) watermarks enable authentication of text provenance, curb misuse of machine-generated text, and promote trust in AI systems. Current watermarks operate by changing the next-token predictions output by an LLM. The updated (i.e., watermarked) predictions depend on random side information produced, for example, by hashing previously generated tokens. LLM watermarking is particularly challenging when next-token predictions are near-deterministic. In fact, over 90% of next-token distributions are low-entropy, with more than half of the probability mass on a single token.
8c2e2925e75e501088004dd685f0ae81-Paper-Conference.pdf
We study the sample complexity of Bayesian recovery for solving inverse problems with general prior, forward operator and noise distributions. We consider posterior sampling according to an approximate prior P, and establish sufficient conditions for stable and accurate recovery with high probability. Our main result is a non-asymptotic bound that shows that the sample complexity depends on (i) the intrinsic complexity of P, quantified by its approximate covering number, and (ii) concentration bounds for the forward operator and noise distributions. As a key application, we specialize to generative priors, where P is the pushforward of a latent distribution via a Deep Neural Network (DNN). We show that the sample complexity scales log-linearly with the latent dimension k, thus establishing the efficacy of DNN-based priors. Generalizing existing results on deterministic (i.e., non-Bayesian) recovery for the important problem of random sampling with an orthogonal matrix U, we show how the sample complexity is determined by the coherence of U with respect to the support of P. Hence, we establish that coherence plays a fundamental role in Bayesian recovery as well. Overall, our framework unifies and extends prior work, providing rigorous guarantees for the sample complexity of solving Bayesian inverse problems with arbitrary distributions.
Bootstrap Your Uncertainty: Adaptive Robust Classification Driven by Optimal-Transport
Distributionally Robust Optimization (DRO) offers a promising framework by optimizing worst-case performance over a set of candidate distributions, referred to as the uncertainty set. However, the efficacy of DRO heavily depends on the design of the uncertainty set, and existing methods often perform suboptimally due to an inappropriate or inflexible uncertainty set. In this work, we first propose a novel perspective that casts entropy-regularized Wasserstein DRO as a dynamic process of distributional exploration and semantic alignment, both driven by optimal transport (OT). This unified viewpoint yields two key new techniques: semantic calibration, which bootstraps semantically meaningful transport costs via inverse OT, and adaptive refinement, which adjusts uncertainty set using OT-driven feedback. Together, these components form an exploration-and-feedback system, where the transport costs and uncertainty set evolve jointly during training, enabling the model to better adapt to potential distribution shifts. Moreover, we provide an in-depth analysis of this adaptive process and prove theoretical guarantees of convergence. Finally, we present our experimental results across diverse distribution shift scenarios, which demonstrate that our approach significantly outperforms existing methods, achieving state-ofthe-art robustness.
Momentum Multi-Marginal Schrödinger Bridge Matching
Understanding complex systems by inferring trajectories from sparse sample snapshots is a fundamental challenge in a wide range of domains, e.g., single-cell biology, meteorology, and economics. Despite advancements in Bridge and Flow matching frameworks, current methodologies rely on pairwise interpolation between adjacent snapshots. This hinders their ability to capture long-range temporal dependencies and potentially affects the coherence of the inferred trajectories. To address these issues, we introduce Momentum Multi-Marginal Schrödinger Bridge Matching (3MSBM), a novel matching framework that learns smooth measure-valued splines for stochastic systems that satisfy multiple positional constraints. This is achieved by lifting the dynamics to phase space and generalizing stochastic bridges to be conditioned on several points, forming a multi-marginal conditional stochastic optimal control problem. The underlying dynamics are then learned by minimizing a variational objective, having fixed the path induced by the multi-marginal conditional bridge. As a matching approach, 3MSBM learns transport maps that preserve intermediate marginals throughout training, significantly improving convergence and scalability. Extensive experimentation in a series of real-world applications validates the superior performance of 3MSBM compared to existing methods in capturing complex dynamics with temporal dependencies, opening new avenues for training matching frameworks in multi-marginal settings.
ReDi: Rectified Discrete Flow
Discrete Flow-based Models (DFMs) are powerful generative models for high-quality discrete data but typically suffer from slow sampling speeds due to their reliance on iterative decoding processes. This reliance on a multi-step process originates from the factorization approximation of DFMs, which is necessary for handling high-dimensional data. In this paper, we analyze the factorization approximation error using Conditional Total Correlation (TC), and reveal its dependence on the coupling. To address the challenge of efficient few-step generation, we propose Rectified Discrete Flow (ReDi), a novel iterative method that reduces the underlying factorization error (measured as Conditional TC) by rectifying the coupling between source and target distributions. We theoretically prove that each ReDi step guarantees a monotonic decreasing Conditional TC, ensuring its convergence. Empirically, ReDi significantly reduces Conditional TC and enables few-step generation. Moreover, we demonstrate that the rectified couplings are well-suited for training efficient one-step models on image generation. ReDi offers a simple and theoretically grounded approach for tackling the few-step challenge, providing a new perspective on efficient discrete data synthesis.
Why 1 + 1 < 1 in Visual Token Pruning: Beyond Naïve Integration via Multi-Objective Balanced Covering
Existing visual token pruning methods target prompt alignment and visual preservation with static strategies, overlooking the varying relative importance of these objectives across tasks, which leads to inconsistent performance. To address this, we derive the first closed-form error bound for visual token pruning based on the Hausdorff distance, uniformly characterizing the contributions of both objectives. Moreover, leveraging ϵ-covering theory, we reveal an intrinsic trade-off between these objectives and quantify their optimal attainment levels under a fixed budget. To practically handle this trade-off, we propose Multi-Objective Balanced Covering (MoB), which reformulates visual token pruning as a bi-objective covering problem. In this framework, the attainment trade-off reduces to budget allocation via greedy radius trading. MoB offers a provable performance bound and linear scalability with respect to the number of input visual tokens, enabling adaptation to challenging pruning scenarios. Extensive experiments show that MoB preserves 96.4% of performance for LLaVA-1.5-7B using only 11.1% of the original visual tokens and accelerates LLaVA-Next-7B by 1.3-1.5 with negligible performance loss. Additionally, evaluations on Qwen2-VL and Video-LLaVA confirm that MoB integrates seamlessly into advanced MLLMs and diverse vision-language tasks. The code is available at https://github.com/YChenL/MoB.
Saddle Networks: Structure-Preserving Architectures for Convex-Concave Functions
Saddle-point models arise throughout optimization, optimal transport, robust learning, and control. In many applications, the relevant function f(x,y) is convex in x and concave in y, and preserving this geometry is essential for obtaining tractable min--max formulations and reliable certificates. We introduce a structured separable decomposition that preserves the convex-concave geometry and prove a complete one-dimensional approximation theorem under a mixed Monge-type convexity condition. We then describe practical saddle network architectures that preserve convexity in x and concavity in y by construction. The proposed architectures require only convexity-preserving neural networks, together with simple output transformations enforcing sign and concavity constraints. Finally, we report numerical benchmarks in dimension 1 and 5, showing that the proposed saddle networks achieve high accuracy on smooth, nonsmooth, and high-rank convex--concave test functions.
Counterfactually Fair Regression via Optimal Transport
Lince, M. Generali, Gaucher, S., Vie, J-J., Loiseau, P.
We consider the problem of learning a counterfactually fair regressor. We adopt a causal uncertainty view in which counterfactual fairness is defined with resampled noise. We focus on obtaining theoretical fairness guarantees for a new post-processing estimator. We begin by showing that counterfactual fairness is equivalent to satisfying demographic parity conditional on the latent variable. This allows us to provide a closed-form expression of the optimal fair regressor via a barycentric quantile map. In order to handle continuous latent variables, we propose a discretized post-processing method. Then, under mild regularity assumptions, we prove high-probability finite-sample fairness guarantees for our estimator, providing an unfairness decay at rate $\tilde O(n^{-1/3})$, and establishing a matching risk bound of order $\tilde O(n^{-1/3})$. We provide a matching lower bound on the excess risk of almost fair predictions. Finally, we extend our results to the setting of relaxed counterfactual fairness. We validate our approach on real-world and synthetic data.