AITopics | fisher divergence

Collaborating Authors

fisher divergence

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Gradient Information for Representation and Modeling

Jie Ding, Robert Calderbank, Vahid Tarokh

Neural Information Processing SystemsFeb-12-2026, 12:46:11 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, information, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)
North America > United States > North Carolina > Durham County > Durham (0.04)
North America > United States > Hawaii (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)

Add feedback

Bi-levelScoreMatchingforLearningEnergy-based LatentVariableModels

Neural Information Processing SystemsFeb-10-2026, 12:44:27 GMT

However, it remains largely open to learn energy-based latent variable models (EBLVMs), exceptsomespecialcases. Thispaperpresents abi-levelscorematching (BiSM) method to learn EBLVMs with general structures by reformulating SM as a bilevel optimization problem. The higher level introduces a variational posterior of the latent variables and optimizes a modified SM objective, and the lower level optimizes the variational posterior to fit the true posterior.

artificial intelligence, deep learning, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > China > Beijing > Beijing (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

1646e34971facbcda3727d1dc28ab635-Paper-Conference.pdf

Neural Information Processing SystemsFeb-8-2026, 06:57:04 GMT

implicit sampler, sampler, target distribution, (14 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
Asia > Middle East > Israel (0.04)
Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

Gaussian Process Tilted Nonparametric Density Estimation using Fisher Divergence Score Matching

Paisley, John, Zhang, Wei, Barr, Brian

arXiv.org Artificial IntelligenceNov-17-2025

We propose a nonparametric density estimator based on the Gaussian process (GP) and derive three novel closed form learning algorithms based on Fisher divergence (FD) score matching. The density estimator is formed by multiplying a base multivariate normal distribution with an exponentiated GP refinement, and so we refer to it as a GP-tilted nonparametric density. By representing the GP part of the score as a linear function using the random Fourier feature (RFF) approximation, we show that optimization can be solved in closed form for the three FD-based objectives considered. This includes the basic and noise conditional versions of the Fisher divergence, as well as an alternative to noise conditional FD models based on variational inference (VI) that we propose in this paper. For this novel learning approach, we propose an ELBO-like optimization to approximate the posterior distribution, with which we then derive a Fisher variational predictive distribution. The RFF representation of the GP, which is functionally equivalent to a single layer neural network score model with cosine activation, provides a useful linear representation of the GP for which all expectations can be solved. The Gaussian base distribution also helps with tractability of the VI approximation and ensures that our proposed density is well-defined. We demonstrate our three learning algorithms, as well as a MAP baseline algorithm, on several low dimensional density estimation problems. The closed form nature of the learning problem removes the reliance on iterative learning algorithms, making this technique particularly well-suited to big data sets, since only sufficient statistics collected from a single pass through the data is needed.

artificial intelligence, fisher divergence, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2504.03485

Genre: Research Report (0.50)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Robust inference using density-powered Stein operators

Eguchi, Shinto

arXiv.org Machine LearningNov-7-2025

We introduce a density-power weighted variant for the Stein operator, called the $γ$-Stein operator. This is a novel class of operators derived from the $γ$-divergence, designed to build robust inference methods for unnormalized probability models. The operator's construction (weighting by the model density raised to a positive power $γ$ inherently down-weights the influence of outliers, providing a principled mechanism for robustness. Applying this operator yields a robust generalization of score matching that retains the crucial property of being independent of the model's normalizing constant. We extend this framework to develop two key applications: the $γ$-kernelized Stein discrepancy for robust goodness-of-fit testing, and $γ$-Stein variational gradient descent for robust Bayesian posterior approximation. Empirical results on contaminated Gaussian and quartic potential models show our methods significantly outperform standard baselines in both robustness and statistical efficiency.

artificial intelligence, machine learning, operator, (19 more...)

arXiv.org Machine Learning

2511.03963

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Fisher meets Feynman: score-based variational inference with a product of experts

Cai, Diana, Gower, Robert M., Blei, David M., Saul, Lawrence K.

arXiv.org Machine LearningOct-27-2025

We introduce a highly expressive yet distinctly tractable family for black-box variational inference (BBVI). Each member of this family is a weighted product of experts (PoE), and each weighted expert in the product is proportional to a multivariate $t$-distribution. These products of experts can model distributions with skew, heavy tails, and multiple modes, but to use them for BBVI, we must be able to sample from their densities. We show how to do this by reformulating these products of experts as latent variable models with auxiliary Dirichlet random variables. These Dirichlet variables emerge from a Feynman identity, originally developed for loop integrals in quantum field theory, that expresses the product of multiple fractions (or in our case, $t$-distributions) as an integral over the simplex. We leverage this simplicial latent space to draw weighted samples from these products of experts -- samples which BBVI then uses to find the PoE that best approximates a target density. Given a collection of experts, we derive an iterative procedure to optimize the exponents that determine their geometric weighting in the PoE. At each iteration, this procedure minimizes a regularized Fisher divergence to match the scores of the variational and target densities at a batch of samples drawn from the current approximation. This minimization reduces to a convex quadratic program, and we prove under general conditions that these updates converge exponentially fast to a near-optimal weighting of experts. We conclude by evaluating this approach on a variety of synthetic and real-world target distributions.

artificial intelligence, machine learning, variational inference, (15 more...)

arXiv.org Machine Learning

2510.21598

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Transportation (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Fast and Scalable Score-Based Kernel Calibration Tests

Glaser, Pierre, Widmann, David, Lindsten, Fredrik, Gretton, Arthur

arXiv.org Machine LearningOct-17-2025

We introduce the Kernel Calibration Conditional Stein Discrepancy test (KCCSD test), a non-parametric, kernel-based test for assessing the calibration of probabilistic models with well-defined scores. In contrast to previous methods, our test avoids the need for possibly expensive expectation approximations while providing control over its type-I error. We achieve these improvements by using a new family of kernels for score-based probabilities that can be estimated without probability density samples, and by using a conditional goodness-of-fit criterion for the KCCSD test's U-statistic. We demonstrate the properties of our test on various synthetic settings.

artificial intelligence, kernel, machine learning, (13 more...)

arXiv.org Machine Learning

2510.14711

Country:

Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)
Europe > Sweden > Östergötland County > Linköping (0.04)
Europe > Sweden > Uppsala County > Uppsala (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

ef72fa6579401ffff9da246a5014f055-Paper-Conference.pdf

Neural Information Processing SystemsOct-10-2025, 21:00:02 GMT

basis function, eigenvi, expansion, (14 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Entropy-based Training Methods for Scalable Neural Implicit Sampler Weijian Luo

Neural Information Processing SystemsOct-8-2025, 04:40:59 GMT

Efficiently sampling from un-normalized target distributions is a fundamental problem in scientific computing and machine learning. Traditional approaches such as Markov Chain Monte Carlo (MCMC) guarantee asymptotically unbiased samples from such distributions but suffer from computational inefficiency, particularly when dealing with high-dimensional targets, as they require numerous iterations to generate a batch of samples. In this paper, we introduce an efficient and scalable neural implicit sampler that overcomes these limitations. The implicit sampler can generate large batches of samples with low computational costs by leveraging a neural transformation that directly maps easily sampled latent vectors to target samples without the need for iterative procedures. To train the neural implicit samplers, we introduce two novel methods: the KL training method and the Fisher training method.

artificial intelligence, machine learning, sampler, (17 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
Asia > Middle East > Israel (0.04)
Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback