AITopics | mmd

Collaborating Authors

mmd

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Neural Tangent Kernel Maximum Mean Discrepancy

Neural Information Processing SystemsDec-23-2025, 23:47:52 GMT

We present a novel neural network Maximum Mean Discrepancy (MMD) statistic by identifying a new connection between neural tangent kernel (NTK) and MMD. This connection enables us to develop a computationally efficient and memory-efficient approach to compute the MMD statistic and perform NTK based two-sample tests towards addressing the long-standing challenge of memory and computational complexity of the MMD statistic, which is essential for online implementation to assimilating new samples. Theoretically, such a connection allows us to understand the NTK test statistic properties, such as the Type-I error and testing power for performing the two-sample test, by adapting existing theories for kernel MMD. Numerical experiments on synthetic and real-world datasets validate the theory and demonstrate the effectiveness of the proposed NTK-MMD statistic.

maximum mean discrepancy, name change, tangent kernel maximum mean discrepancy, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.78)

Add feedback

A Expressing Popular Forms of Calibration as Distribution Matching

Neural Information Processing SystemsOct-8-2025, 16:48:35 GMT

This can be written succinctly as Y, b Y | ( X) (18) A.2 Calibration in Classification ECE used for break ties. For each model and dataset, the best performing model is then re-run with 50 random seeds to gather information about standard errors and statistical significance. Kernel Bandwidth We select the RBF kernel bandwidth for training on each dataset using the aforementioned hyperparameter optimization. For each county, we track the weather sequence of each year into a few summary statistics for each month (average/maximum/minimum temperatures, precipitation, cooling/heating degree days). All other hyperparameters are held constant, including the number of training steps.

calibration, decision calibration, kernel, (14 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

E-ROBOT: a dimension-free method for robust statistics and machine learning via Schrödinger bridge

La Vecchia, Davide, Liu, Hang

arXiv.org Machine LearningSep-16-2025

We propose the Entropic-regularized Robust Optimal Transport (E-ROBOT) framework, a novel method that combines the robustness of ROBOT with the computational and statistical benefits of entropic regularization. We show that, rooted in the Schrödinger bridge problem theory, E-ROBOT defines the robust Sinkhorn divergence $\overline{W}_{\varepsilon,λ}$, where the parameter $λ$ controls robustness and $\varepsilon$ governs the regularization strength. Letting $n\in \mathbb{N}$ denote the sample size, a central theoretical contribution is establishing that the sample complexity of $\overline{W}_{\varepsilon,λ}$ is $\mathcal{O}(n^{-1/2})$, thereby avoiding the curse of dimensionality that plagues standard ROBOT. This dimension-free property unlocks the use of $\overline{W}_{\varepsilon,λ}$ as a loss function in large-dimensional statistical and machine learning tasks. With this regard, we demonstrate its utility through four applications: goodness-of-fit testing; computation of barycenters for corrupted 2D and 3D shapes; definition of gradient flows; and image colour transfer. From the computation standpoint, a perk of our novel method is that it can be easily implemented by modifying existing (\texttt{Python}) routines. From the theoretical standpoint, our work opens the door to many research directions in statistics and machine learning: we discuss some of them.

e-robot, optimal transport, outlier, (14 more...)

arXiv.org Machine Learning

2509.11532

Country:

Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)
Asia > China > Anhui Province > Hefei (0.04)
Europe > Switzerland > Geneva > Geneva (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Kernel Trace Distance: Quantum Statistical Metric between Measures through RKHS Density Operators

Castellanos, Arturo, Korba, Anna, Mozharovskyi, Pavlo, Janati, Hicham

arXiv.org Machine LearningJul-9-2025

Distances between probability distributions are a key component of many statistical machine learning tasks, from two-sample testing to generative modeling, among others. We introduce a novel distance between measures that compares them through a Schatten norm of their kernel covariance operators. We show that this new distance is an integral probability metric that can be framed between a Maximum Mean Discrepancy (MMD) and a Wasserstein distance. In particular, we show that it avoids some pitfalls of MMD, by being more discriminative and robust to the choice of hyperparameters. Moreover, it benefits from some compelling properties of kernel methods, that can avoid the curse of dimensionality for their sample complexity. We provide an algorithm to compute the distance in practice by introducing an extension of kernel matrix for difference of distributions that could be of independent interest. Those advantages are illustrated by robust approximate Bayesian computation under contamination as well as particle flow simulations.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Machine Learning

2507.06055

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback

Reviews: Population Matching Discrepancy and Applications in Deep Learning

Neural Information Processing SystemsOct-9-2024, 02:18:39 GMT

This paper presents the population matching discrepancy (PMD) as a better alternative to MMD for distribution matching applications. It is shown that PMD is a sampled version of Wasserstein metric or earth mover's distance, and it has a few advantages over MMD, most notably stronger gradients and the applicability of smaller mini-batch sizes, and fewer hyperparameters. For training generative models at least, the MMD metric does suffer from weak gradients and the requirement of large mini-batches, the proposals in this paper therefore provides a nice solution to both of these problems. The small mini-batch claim is verified quite nicely in the empirical results. The verification of the stronger gradients claim is less satisfactory, since the MMD metric depends on the scale parameter sigma, it is essential to consider either the best sigma or a range of sigmas when making such a claim. In terms of having fewer hyper-parameters, I feel this claim is less well-supported, because PMD depends on a distance metric, and this distance metric might contain extra hyperparameters as well as in the MMD case.

deep learning, distance metric, population matching discrepancy and application, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

Add feedback

MMD GAN: Towards Deeper Understanding of Moment Matching Network

Chun-Liang Li, Wei-Cheng Chang, Yu Cheng, Yiming Yang, Barnabas Poczos

Neural Information Processing SystemsOct-4-2024, 08:23:55 GMT

Generative moment matching network (GMMN) is a deep generative model that differs from Generative Adversarial Network (GAN) by replacing the discriminator in GAN with a two-sample test based on kernel maximum mean discrepancy (MMD). Although some theoretical guarantees of MMD have been studied, the empirical performance of GMMN is still not as competitive as that of GAN on challenging and large benchmark datasets. The computational efficiency of GMMN is also less desirable in comparison with GAN, partially due to its requirement for a rather large batch size during the training. In this paper, we propose to improve both the model expressiveness of GMMN and its computational efficiency by introducing adversarial kernel learning techniques, as the replacement of a fixed Gaussian kernel in the original GMMN. The new approach combines the key ideas in both GMMN and GAN, hence we name it MMD GAN.

dataset, gan, kernel, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Population Matching Discrepancy and Applications in Deep Learning

Jianfei Chen, Chongxuan LI, Yizhong Ru, Jun Zhu

Neural Information Processing SystemsOct-3-2024, 00:40:08 GMT

Neural Information Processing Systems http://nips.cc/

gradient, mmd, pmd, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China > Beijing > Beijing (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

Add feedback

Minimax Estimation of Maximum Mean Discrepancy with Radial Kernels

Neural Information Processing SystemsMar-12-2024, 11:14:53 GMT

Maximum Mean Discrepancy (MMD) is a distance on the space of probability measures which has found numerous applications in machine learning and nonparametric testing. This distance is based on the notion of embedding probabilities in a reproducing kernel Hilbert space. In this paper, we present the first known lower bounds for the estimation of MMD based on finite samples.

estimation, estimator, kernel, (14 more...)

Neural Information Processing Systems

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Pennsylvania > Centre County > University Park (0.04)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)

Add feedback

Wasserstein Gradient Flows for Moreau Envelopes of f-Divergences in Reproducing Kernel Hilbert Spaces

Neumayer, Sebastian, Stein, Viktor, Steidl, Gabriele

arXiv.org Artificial IntelligenceFeb-7-2024

Most commonly used $f$-divergences of measures, e.g., the Kullback-Leibler divergence, are subject to limitations regarding the support of the involved measures. A remedy consists of regularizing the $f$-divergence by a squared maximum mean discrepancy (MMD) associated with a characteristic kernel $K$. In this paper, we use the so-called kernel mean embedding to show that the corresponding regularization can be rewritten as the Moreau envelope of some function in the reproducing kernel Hilbert space associated with $K$. Then, we exploit well-known results on Moreau envelopes in Hilbert spaces to prove properties of the MMD-regularized $f$-divergences and, in particular, their gradients. Subsequently, we use our findings to analyze Wasserstein gradient flows of MMD-regularized $f$-divergences. Finally, we consider Wasserstein gradient flows starting from empirical measures and provide proof-of-the-concept numerical examples with Tsallis-$\alpha$ divergences.

artificial intelligence, divergence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2402.04613

Country:

Europe > United Kingdom (0.14)
Europe > Germany (0.14)
North America > United States > New York (0.14)

Genre: Research Report > New Finding (0.48)

Industry: Energy > Oil & Gas (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.45)

Add feedback

Maximum Mean Discrepancy Meets Neural Networks: The Radon-Kolmogorov-Smirnov Test

Paik, Seunghoon, Celentano, Michael, Green, Alden, Tibshirani, Ryan J.

arXiv.org Machine LearningNov-6-2023

Maximum mean discrepancy (MMD) refers to a general class of nonparametric two-sample tests that are based on maximizing the mean difference over samples from one distribution $P$ versus another $Q$, over all choices of data transformations $f$ living in some function space $\mathcal{F}$. Inspired by recent work that connects what are known as functions of $\textit{Radon bounded variation}$ (RBV) and neural networks (Parhi and Nowak, 2021, 2023), we study the MMD defined by taking $\mathcal{F}$ to be the unit ball in the RBV space of a given smoothness order $k \geq 0$. This test, which we refer to as the $\textit{Radon-Kolmogorov-Smirnov}$ (RKS) test, can be viewed as a generalization of the well-known and classical Kolmogorov-Smirnov (KS) test to multiple dimensions and higher orders of smoothness. It is also intimately connected to neural networks: we prove that the witness in the RKS test -- the function $f$ achieving the maximum mean difference -- is always a ridge spline of degree $k$, i.e., a single neuron in a neural network. This allows us to leverage the power of modern deep learning toolkits to (approximately) optimize the criterion that underlies the RKS test. We prove that the RKS test has asymptotically full power at distinguishing any distinct pair $P \not= Q$ of distributions, derive its asymptotic null distribution, and carry out extensive experiments to elucidate the strengths and weakenesses of the RKS test versus the more traditional kernel MMD test.

artificial intelligence, machine learning, positive rate true positive rate, (15 more...)

arXiv.org Machine Learning

2309.02422

Country:

North America > United States > California (0.14)
North America > United States > Wisconsin (0.14)

Genre: Research Report (1.00)

Industry: Energy > Oil & Gas > Upstream (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback