AITopics | 2-wasserstein distance

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > Canada > Quebec > Montreal (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Neural Information Processing SystemsFeb-11-2026, 00:27:51 GMT

DeepDiffusion-Invariant WassersteinDistributionalClassification

How can the stochastic properties of input data and labels be appropriately captured to handle severe perturbations? To answer this question, we represent both input data and target labels as probability measures (i.e., probability densities), denoted asµn and ˆνn, respectively, in the Wasserstein space and solve a distance-based classification problem (i.e.,

artificial intelligence, machine learning, perturbation, (15 more...)

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > South Korea > Seoul > Seoul (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsFeb-7-2026, 16:26:33 GMT

18210aa6209b9adfc97b8c17c3741d95-Supplemental-Conference.pdf

generalized loss, hyperparameter, kernel, (16 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

arXiv.org Machine LearningDec-1-2025

Terminal Velocity Matching

Zhou, Linqi, Parger, Mathias, Haque, Ayaan, Song, Jiaming

We propose Terminal Velocity Matching (TVM), a generalization of flow matching that enables high-fidelity one- and few-step generative modeling. TVM models the transition between any two diffusion timesteps and regularizes its behavior at its terminal time rather than at the initial time. We prove that TVM provides an upper bound on the $2$-Wasserstein distance between data and model distributions when the model is Lipschitz continuous. However, since Diffusion Transformers lack this property, we introduce minimal architectural changes that achieve stable, single-stage training. To make TVM efficient in practice, we develop a fused attention kernel that supports backward passes on Jacobian-Vector Products, which scale well with transformer architectures. On ImageNet-256x256, TVM achieves 3.29 FID with a single function evaluation (NFE) and 1.99 FID with 4 NFEs. It similarly achieves 4.32 1-NFE FID and 2.94 4-NFE FID on ImageNet-512x512, representing state-of-the-art performance for one/few-step models from scratch.

arxiv preprint arxiv, matching, meanflow, (14 more...)

2511.19797

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Gintare Karolina Dziugaite, Daniel M. Roy

Data-dependent PAC-Bayes priors via differential privacy

Neural Information Processing SystemsNov-20-2025, 18:37:48 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, pac-bayes, (18 more...)

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > Canada > Quebec > Montreal (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Bajwa, Waheed U., Gurbuzbalaban, Mert, Kutbay, Mustafa Ali, Zhu, Lingjiong, Zulqarnain, Muhammad

DIGing--SGLD: Decentralized and Scalable Langevin Sampling over Time--Varying Networks

arXiv.org Machine LearningNov-18-2025

Sampling from a target distribution induced by training data is central to Bayesian learning, with Stochastic Gradient Langevin Dynamics (SGLD) serving as a key tool for scalable posterior sampling and decentralized variants enabling learning when data are distributed across a network of agents. This paper introduces DIGing-SGLD, a decentralized SGLD algorithm designed for scalable Bayesian learning in multi-agent systems operating over time-varying networks. Existing decentralized SGLD methods are restricted to static network topologies, and many exhibit steady-state sampling bias caused by network effects, even when full batches are used. DIGing-SGLD overcomes these limitations by integrating Langevin-based sampling with the gradient-tracking mechanism of the DIGing algorithm, originally developed for decentralized optimization over time-varying networks, thereby enabling efficient and bias-free sampling without a central coordinator. To our knowledge, we provide the first finite-time non-asymptotic Wasserstein convergence guarantees for decentralized SGLD-based sampling over time-varying networks, with explicit constants. Under standard strong convexity and smoothness assumptions, DIGing-SGLD achieves geometric convergence to an $O(\sqrtη)$ neighborhood of the target distribution, where $η$ is the stepsize, with dependence on the target accuracy matching the best-known rates for centralized and static-network SGLD algorithms using constant stepsize. Numerical experiments on Bayesian linear and logistic regression validate the theoretical results and demonstrate the strong empirical performance of DIGing-SGLD under dynamically evolving network conditions.

artificial intelligence, bayesian inference, machine learning, (17 more...)

2511.12836

Country:

North America > United States > Wisconsin (0.04)
North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
North America > United States > Michigan (0.04)
(2 more...)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

arXiv.org Machine LearningNov-10-2025

On Flow Matching KL Divergence

Su, Maojiang, Hu, Jerry Yao-Chieh, Pi, Sophia, Liu, Han

We derive a deterministic, non-asymptotic upper bound on the Kullback-Leibler (KL) divergence of the flow-matching distribution approximation. In particular, if the $L_2$ flow-matching loss is bounded by $ε^2 > 0$, then the KL divergence between the true data distribution and the estimated distribution is bounded by $A_1 ε+ A_2 ε^2$. Here, the constants $A_1$ and $A_2$ depend only on the regularities of the data and velocity fields. Consequently, this bound implies statistical convergence rates of Flow Matching Transformers under the Total Variation (TV) distance. We show that, flow matching achieves nearly minimax-optimal efficiency in estimating smooth distributions. Our results make the statistical efficiency of flow matching comparable to that of diffusion models under the TV distance. Numerical studies on synthetic and learned velocities corroborate our theory.

artificial intelligence, assumption 3, machine learning, (13 more...)

2511.0548

Country:

North America > United States > Illinois > Cook County > Evanston (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Kremling, Gitte, Iafrate, Francesco, Taheri, Mahsa, Lederer, Johannes

Non-asymptotic error bounds for probability flow ODEs under weak log-concavity

arXiv.org Machine LearningOct-21-2025

Score-based generative modeling, implemented through probability flow ODEs, has shown impressive results in numerous practical settings. However, most convergence guarantees rely on restrictive regularity assumptions on the target distribution -- such as strong log-concavity or bounded support. This work establishes non-asymptotic convergence bounds in the 2-Wasserstein distance for a general class of probability flow ODEs under considerably weaker assumptions: weak log-concavity and Lipschitz continuity of the score function. Our framework accommodates non-log-concave distributions, such as Gaussian mixtures, and explicitly accounts for initialization errors, score approximation errors, and effects of discretization via an exponential integrator scheme. Bridging a key theoretical challenge in diffusion-based generative modeling, our results extend convergence theory to more realistic data distributions and practical ODE solvers. We provide concrete guarantees for the efficiency and correctness of the sampling algorithm, complementing the empirical success of diffusion models with rigorous theory. Moreover, from a practical perspective, our explicit rates might be helpful in choosing hyperparameters, such as the step size in the discretization.

artificial intelligence, assumption, machine learning, (17 more...)

2510.17608

Country: Europe > Germany > Hamburg (0.04)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Li, Haolin, Bidkhori, Hoda

FedGTEA: Federated Class-Incremental Learning with Gaussian Task Embedding and Alignment

arXiv.org Machine LearningOct-16-2025

We introduce a novel framework for Federated Class Incremental Learning, called Federated Gaussian Task Embedding and Alignment (FedGTEA). FedGTEA is designed to capture task-specific knowledge and model uncertainty in a scalable and communication-efficient manner. At the client side, the Cardinality-Agnostic Task Encoder (CATE) produces Gaussian-distributed task embed-dings that encode task knowledge, address statistical heterogeneity, and quantify data uncertainty. Importantly, CATE maintains a fixed parameter size regardless of the number of tasks, which ensures scalability across long task sequences. On the server side, FedGTEA utilizes the 2-Wasserstein distance to measure inter-task gaps between Gaussian embeddings. We formulate the Wasserstein loss to enforce inter-task separation. This probabilistic formulation not only enhances representation learning but also preserves task-level privacy by avoiding the direct transmission of latent embed-dings, aligning with the privacy constraints in federated learning. Extensive empirical evaluations on popular datasets demonstrate that FedGTEA achieves superior classification performance and significantly mitigates forgetting, consistently outperforming strong existing baselines.

artificial intelligence, learning, machine learning, (13 more...)