AITopics | Frogner, Charlie

Learning Embeddings into Entropic Wasserstein Spaces

Frogner, Charlie, Mirzazadeh, Farzaneh, Solomon, Justin

arXiv.org Machine LearningMay-8-2019

Euclidean embeddings of data are fundamentally limited in their ability to capture latent semantic structures, which need not conform to Euclidean spatial assumptions. Here we consider an alternative, which embeds data as discrete probability distributions in a Wasserstein space, endowed with an optimal transport metric. Wasserstein spaces are much larger and more flexible than Euclidean spaces, in that they can successfully embed a wider variety of metric structures. We exploit this flexibility by learning an embedding that captures semantic information in the Wasserstein distance between embedded distributions. We examine empirically the representational capacity of our learned Wasserstein embeddings, showing that they can embed a wide variety of metric structures with smaller distortion than an equivalent Euclidean embedding. We also investigate an application to word embedding, demonstrating a unique advantage of Wasserstein embeddings: We can visualize the high-dimensional embedding directly, since it is a probability distribution on a low-dimensional space.

artificial intelligence, neural network, wasserstein space, (18 more...)

arXiv.org Machine Learning

1905.03329

Country:

North America > United States (0.46)
North America > Canada > Alberta (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Approximate inference with Wasserstein gradient flows

Frogner, Charlie, Poggio, Tomaso

arXiv.org Machine LearningJun-12-2018

We present a novel approximate inference method for diffusion processes, based on the Wasserstein gradient flow formulation of the diffusion. In this formulation, the time-dependent density of the diffusion is derived as the limit of implicit Euler steps that follow the gradients of a particular free energy functional. Existing methods for computing Wasserstein gradient flows rely on discretization of the domain of the diffusion, prohibiting their application to domains in more than several dimensions. We propose instead a discretization-free inference method that computes the Wasserstein gradient flow directly in a space of continuous functions. We characterize approximation properties of the proposed method and evaluate it on a nonlinear filtering task, finding performance comparable to the state-of-the-art for filtering diffusions.

artificial intelligence, diffusion, upstream oil & gas, (15 more...)

arXiv.org Machine Learning

1806.04542

Genre: Research Report (0.82)

Industry: Energy > Oil & Gas > Upstream (0.46)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

Learning with a Wasserstein Loss

Frogner, Charlie, Zhang, Chiyuan, Mobahi, Hossein, Araya, Mauricio, Poggio, Tomaso A.

Neural Information Processing SystemsDec-31-2015

Learning to predict multi-label outputs is challenging, but in many problems there is a natural metric on the outputs that can be used to improve predictions. In this paper we develop a loss function for multi-label learning, based on the Wasserstein distance. The Wasserstein distance provides a natural notion of dissimilarity for probability measures. Although optimizing with respect to the exact Wasserstein distance is costly, recent work has described a regularized approximation that is efficiently computed. We describe an efficient learning algorithm based on this regularization, as well as a novel extension of the Wasserstein distance from probability measures to unnormalized measures. We also describe a statistical learning bound for the loss. The Wasserstein loss can encourage smoothness of the predictions with respect to a chosen metric on the output space. We demonstrate this property on a real-data tag prediction problem, using the Yahoo Flickr Creative Commons dataset, outperforming a baseline that doesn't use the metric.

artificial intelligence, natural language, wasserstein loss, (17 more...)

Neural Information Processing Systems

Country: North America > United States > Massachusetts (0.15)

Industry: Information Technology (0.55)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)

Add feedback

Learning with a Wasserstein Loss

Frogner, Charlie, Zhang, Chiyuan, Mobahi, Hossein, Araya-Polo, Mauricio, Poggio, Tomaso

arXiv.org Machine LearningDec-29-2015

Learning to predict multi-label outputs is challenging, but in many problems there is a natural metric on the outputs that can be used to improve predictions. In this paper we develop a loss function for multi-label learning, based on the Wasserstein distance. The Wasserstein distance provides a natural notion of dissimilarity for probability measures. Although optimizing with respect to the exact Wasserstein distance is costly, recent work has described a regularized approximation that is efficiently computed. We describe an efficient learning algorithm based on this regularization, as well as a novel extension of the Wasserstein distance from probability measures to unnormalized measures. We also describe a statistical learning bound for the loss. The Wasserstein loss can encourage smoothness of the predictions with respect to a chosen metric on the output space. We demonstrate this property on a real-data tag prediction problem, using the Yahoo Flickr Creative Commons dataset, outperforming a baseline that doesn't use the metric.

artificial intelligence, natural language, wasserstein loss, (16 more...)

arXiv.org Machine Learning

1506.05439

Country: North America > United States > Massachusetts (0.15)

Genre: Research Report (0.65)

Industry: Information Technology (0.56)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)

Add feedback

Discovering Weakly-Interacting Factors in a Complex Stochastic Process

Frogner, Charlie, Pfeffer, Avi

Neural Information Processing SystemsDec-31-2008

Dynamic Bayesian networks are structured representations of stochastic processes. Despitetheir structure, exact inference in DBNs is generally intractable. One approach to approximate inference involves grouping the variables in the process into smaller factors and keeping independent beliefs over these factors.

artificial intelligence, factorization, health & medicine, (18 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology: