AITopics | wasserstein training

Wasserstein Training of Restricted Boltzmann Machines

Neural Information Processing SystemsNov-21-2025, 14:51:45 GMT

Boltzmann machines are able to learn highly complex, multimodal, structured and multiscale real-world data distributions. Parameters of the model are usually learned by minimizing the Kullback-Leibler (KL) divergence from training samples to the learned model. We propose in this work a novel approach for Boltzmann machine training which assumes that a meaningful metric between observations is known. This metric between observations can then be used to define the Wasserstein distance between the distribution induced by the Boltzmann machine on the one hand, and that given by the training sample on the other hand. We derive a gradient of that distance with respect to the model parameters. Minimization of this new objective leads to generative models with different statistical properties. We demonstrate their practical potential on data completion and denoising, for which the metric between observations plays a crucial role.

name change, restricted boltzmann machine, wasserstein training, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Reviews: Wasserstein Training of Restricted Boltzmann Machines

Neural Information Processing SystemsJan-20-2025, 13:07:02 GMT

The paper refers to [2] and says that those authors proved statistical consistency. However, I am then surprised to see in section 4.3 that non-zero shrinkage is obtained (including for gamma 0) for the very simple case of modelling a N(0,I) distribution with N(0, sigma 2 I). What is going on here?? A failure of consistency would be a serious flaw in the formulation of a statistical learning criterion. Also in sec 3 (Stability and KL regularization) the authors say that at least for learning based on samples (\hat{p}_{theta}) that some regularization wrt the KL divergence is required. This clearly weakens the "purity" of the smoothed Wasserstein objective fn.

consistency, restricted boltzmann machine, wasserstein training, (10 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.40)

Add feedback

Wasserstein Training of Restricted Boltzmann Machines

Montavon, Grégoire, Müller, Klaus-Robert, Cuturi, Marco

Neural Information Processing SystemsFeb-14-2020, 14:41:51 GMT

Boltzmann machines are able to learn highly complex, multimodal, structured and multiscale real-world data distributions. Parameters of the model are usually learned by minimizing the Kullback-Leibler (KL) divergence from training samples to the learned model. We propose in this work a novel approach for Boltzmann machine training which assumes that a meaningful metric between observations is known. This metric between observations can then be used to define the Wasserstein distance between the distribution induced by the Boltzmann machine on the one hand, and that given by the training sample on the other hand. We derive a gradient of that distance with respect to the model parameters.

restricted boltzmann machine, training sample, wasserstein training, (1 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Wasserstein Training of Restricted Boltzmann Machines

Montavon, Grégoire, Müller, Klaus-Robert, Cuturi, Marco

Neural Information Processing SystemsDec-31-2016

Boltzmann machines are able to learn highly complex, multimodal, structured and multiscale real-world data distributions. Parameters of the model are usually learned by minimizing the Kullback-Leibler (KL) divergence from training samples to the learned model. We propose in this work a novel approach for Boltzmann machine training which assumes that a meaningful metric between observations is known. This metric between observations can then be used to define the Wasserstein distance between the distribution induced by the Boltzmann machine on the one hand, and that given by the training sample on the other hand. We derive a gradient of that distance with respect to the model parameters. Minimization of this new objective leads to generative models with different statistical properties. We demonstrate their practical potential on data completion and denoising, for which the metric between observations plays a crucial role.

artificial intelligence, machine learning, wasserstein distance, (19 more...)

Neural Information Processing Systems

Country: North America > United States (0.68)

Genre: Research Report (0.34)

Industry: Government > Regional Government (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Wasserstein Training of Boltzmann Machines

Montavon, Grégoire, Müller, Klaus-Robert, Cuturi, Marco

arXiv.org Machine LearningJul-7-2015

The Boltzmann machine provides a useful framework to learn highly complex, multimodal and multiscale data distributions that occur in the real world. The default method to learn its parameters consists of minimizing the Kullback-Leibler (KL) divergence from training samples to the Boltzmann model. We propose in this work a novel approach for Boltzmann training which assumes that a meaningful metric between observations is given. This metric can be represented by the Wasserstein distance between distributions, for which we derive a gradient with respect to the model parameters. Minimization of this new Wasserstein objective leads to generative models that are better when considering the metric and that have a cluster-like structure. We demonstrate the practical potential of these models for data completion and denoising, for which the metric between observations plays a crucial role.

artificial intelligence, boltzmann machine, machine learning, (18 more...)

arXiv.org Machine Learning

1507.01972

Country: