mmd
Neural Tangent Kernel Maximum Mean Discrepancy
We present a novel neural network Maximum Mean Discrepancy (MMD) statistic by identifying a new connection between neural tangent kernel (NTK) and MMD. This connection enables us to develop a computationally efficient and memory-efficient approach to compute the MMD statistic and perform NTK based two-sample tests towards addressing the long-standing challenge of memory and computational complexity of the MMD statistic, which is essential for online implementation to assimilating new samples. Theoretically, such a connection allows us to understand the NTK test statistic properties, such as the Type-I error and testing power for performing the two-sample test, by adapting existing theories for kernel MMD. Numerical experiments on synthetic and real-world datasets validate the theory and demonstrate the effectiveness of the proposed NTK-MMD statistic.
A Expressing Popular Forms of Calibration as Distribution Matching
This can be written succinctly as Y, b Y | ( X) (18) A.2 Calibration in Classification ECE used for break ties. For each model and dataset, the best performing model is then re-run with 50 random seeds to gather information about standard errors and statistical significance. Kernel Bandwidth We select the RBF kernel bandwidth for training on each dataset using the aforementioned hyperparameter optimization. For each county, we track the weather sequence of each year into a few summary statistics for each month (average/maximum/minimum temperatures, precipitation, cooling/heating degree days). All other hyperparameters are held constant, including the number of training steps.
E-ROBOT: a dimension-free method for robust statistics and machine learning via Schrödinger bridge
We propose the Entropic-regularized Robust Optimal Transport (E-ROBOT) framework, a novel method that combines the robustness of ROBOT with the computational and statistical benefits of entropic regularization. We show that, rooted in the Schrödinger bridge problem theory, E-ROBOT defines the robust Sinkhorn divergence $\overline{W}_{\varepsilon,λ}$, where the parameter $λ$ controls robustness and $\varepsilon$ governs the regularization strength. Letting $n\in \mathbb{N}$ denote the sample size, a central theoretical contribution is establishing that the sample complexity of $\overline{W}_{\varepsilon,λ}$ is $\mathcal{O}(n^{-1/2})$, thereby avoiding the curse of dimensionality that plagues standard ROBOT. This dimension-free property unlocks the use of $\overline{W}_{\varepsilon,λ}$ as a loss function in large-dimensional statistical and machine learning tasks. With this regard, we demonstrate its utility through four applications: goodness-of-fit testing; computation of barycenters for corrupted 2D and 3D shapes; definition of gradient flows; and image colour transfer. From the computation standpoint, a perk of our novel method is that it can be easily implemented by modifying existing (\texttt{Python}) routines. From the theoretical standpoint, our work opens the door to many research directions in statistics and machine learning: we discuss some of them.
- Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)
- Asia > China > Anhui Province > Hefei (0.04)
- Europe > Switzerland > Geneva > Geneva (0.04)
- North America > Canada (0.14)
- Europe (0.14)
Kernel Trace Distance: Quantum Statistical Metric between Measures through RKHS Density Operators
Castellanos, Arturo, Korba, Anna, Mozharovskyi, Pavlo, Janati, Hicham
Distances between probability distributions are a key component of many statistical machine learning tasks, from two-sample testing to generative modeling, among others. We introduce a novel distance between measures that compares them through a Schatten norm of their kernel covariance operators. We show that this new distance is an integral probability metric that can be framed between a Maximum Mean Discrepancy (MMD) and a Wasserstein distance. In particular, we show that it avoids some pitfalls of MMD, by being more discriminative and robust to the choice of hyperparameters. Moreover, it benefits from some compelling properties of kernel methods, that can avoid the curse of dimensionality for their sample complexity. We provide an algorithm to compute the distance in practice by introducing an extension of kernel matrix for difference of distributions that could be of independent interest. Those advantages are illustrated by robust approximate Bayesian computation under contamination as well as particle flow simulations.
Quality Measures for Dynamic Graph Generative Models
Hosseini, Ryien, Simini, Filippo, Vishwanath, Venkatram, Willett, Rebecca, Hoffmann, Henry
Deep generative models have recently achieved significant success in modeling graph data, including dynamic graphs, where topology and features evolve over time. However, unlike in vision and natural language domains, evaluating generative models for dynamic graphs is challenging due to the difficulty of visualizing their output, making quantitative metrics essential. In this work, we develop a new quality metric for evaluating generative models of dynamic graphs. Current metrics for dynamic graphs typically involve discretizing the continuous-evolution of graphs into static snapshots and then applying conventional graph similarity measures. This approach has several limitations: (a) it models temporally related events as i.i.d. samples, failing to capture the non-uniform evolution of dynamic graphs; (b) it lacks a unified measure that is sensitive to both features and topology; (c) it fails to provide a scalar metric, requiring multiple metrics without clear superiority; and (d) it requires explicitly instantiating each static snapshot, leading to impractical runtime demands that hinder evaluation at scale. We propose a novel metric based on the \textit{Johnson-Lindenstrauss} lemma, applying random projections directly to dynamic graph data. This results in an expressive, scalar, and application-agnostic measure of dynamic graph similarity that overcomes the limitations of traditional methods. We also provide a comprehensive empirical evaluation of metrics for continuous-time dynamic graphs, demonstrating the effectiveness of our approach compared to existing methods. Our implementation is available at https://github.com/ryienh/jl-metric.
- Energy (0.67)
- Education > Educational Setting > Online (0.46)
- Government > Regional Government (0.46)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)
Reviews: Population Matching Discrepancy and Applications in Deep Learning
This paper presents the population matching discrepancy (PMD) as a better alternative to MMD for distribution matching applications. It is shown that PMD is a sampled version of Wasserstein metric or earth mover's distance, and it has a few advantages over MMD, most notably stronger gradients and the applicability of smaller mini-batch sizes, and fewer hyperparameters. For training generative models at least, the MMD metric does suffer from weak gradients and the requirement of large mini-batches, the proposals in this paper therefore provides a nice solution to both of these problems. The small mini-batch claim is verified quite nicely in the empirical results. The verification of the stronger gradients claim is less satisfactory, since the MMD metric depends on the scale parameter sigma, it is essential to consider either the best sigma or a range of sigmas when making such a claim. In terms of having fewer hyper-parameters, I feel this claim is less well-supported, because PMD depends on a distance metric, and this distance metric might contain extra hyperparameters as well as in the MMD case.
MMD GAN: Towards Deeper Understanding of Moment Matching Network
Chun-Liang Li, Wei-Cheng Chang, Yu Cheng, Yiming Yang, Barnabas Poczos
Generative moment matching network (GMMN) is a deep generative model that differs from Generative Adversarial Network (GAN) by replacing the discriminator in GAN with a two-sample test based on kernel maximum mean discrepancy (MMD). Although some theoretical guarantees of MMD have been studied, the empirical performance of GMMN is still not as competitive as that of GAN on challenging and large benchmark datasets. The computational efficiency of GMMN is also less desirable in comparison with GAN, partially due to its requirement for a rather large batch size during the training. In this paper, we propose to improve both the model expressiveness of GMMN and its computational efficiency by introducing adversarial kernel learning techniques, as the replacement of a fixed Gaussian kernel in the original GMMN. The new approach combines the key ideas in both GMMN and GAN, hence we name it MMD GAN.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > China > Beijing > Beijing (0.04)
Minimax Estimation of Maximum Mean Discrepancy with Radial Kernels
Maximum Mean Discrepancy (MMD) is a distance on the space of probability measures which has found numerous applications in machine learning and nonparametric testing. This distance is based on the notion of embedding probabilities in a reproducing kernel Hilbert space. In this paper, we present the first known lower bounds for the estimation of MMD based on finite samples.
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Pennsylvania > Centre County > University Park (0.04)
- (5 more...)