AITopics | Marzouk, Youssef

An adaptive transport framework for joint and conditional density estimation

Baptista, Ricardo, Zahm, Olivier, Marzouk, Youssef

arXiv.org Machine LearningSep-21-2020

We propose a general framework to robustly characterize joint and conditional probability distributions via transport maps. Transport maps or "flows" deterministically couple two distributions via an expressive monotone transformation. Yet, learning the parameters of such transformations in high dimensions is challenging given few samples from the unknown target distribution, and structural choices for these transformations can have a significant impact on performance. Here we formulate a systematic framework for representing and learning monotone maps, via invertible transformations of smooth functions, and demonstrate that the associated minimization problem has a unique global optimum. Given a hierarchical basis for the appropriate function space, we propose a sample-efficient adaptive algorithm that estimates a sparse approximation for the map. We demonstrate how this framework can learn densities with stable generalization performance across a wide range of sample sizes on real-world datasets.

artificial intelligence, bayesian inference, density estimation, (15 more...)

arXiv.org Machine Learning

2009.10303

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)

Add feedback

Conditional Sampling With Monotone GANs

Kovachki, Nikola, Baptista, Ricardo, Hosseini, Bamdad, Marzouk, Youssef

arXiv.org Machine LearningJun-11-2020

We present a new approach for sampling conditional measures that enables uncertainty quantification in supervised learning tasks. We construct a mapping that transforms a reference measure to the probability measure of the output conditioned on new inputs. The mapping is trained via a modification of generative adversarial networks (GANs), called monotone GANs, that imposes monotonicity constraints and a block triangular structure. We present theoretical results, in an idealized setting, that support our proposed method as well as numerical experiments demonstrating the ability of our method to sample the correct conditional measures in applications ranging from inverse problems to image in-painting.

bayesian inference, constraint, neural network, (18 more...)

arXiv.org Machine Learning

2006.06755

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)

Add feedback

Coupling techniques for nonlinear ensemble filtering

Spantini, Alessio, Baptista, Ricardo, Marzouk, Youssef

arXiv.org Machine LearningJun-30-2019

We consider filtering in high-dimensional non-Gaussian state-space models with intractable transition kernels, nonlinear and possibly chaotic dynamics, and sparse observations in space and time. We propose a novel filtering methodology that harnesses transportation of measures, convex optimization, and ideas from probabilistic graphical models to yield robust ensemble approximations of the filtering distribution in high dimensions. Our approach can be understood as the natural generalization of the ensemble Kalman filter (EnKF) to nonlinear updates, using stochastic or deterministic couplings. The use of nonlinear updates can reduce the intrinsic bias of the EnKF at a marginal increase in computational cost. We avoid any form of importance sampling and introduce non-Gaussian localization approaches for dimension scalability. Our framework achieves state-of-the-art tracking performance on challenging configurations of the Lorenz-96 model in the chaotic regime.

bayesian inference, coupling, upstream oil & gas, (20 more...)

arXiv.org Machine Learning

1907.00389

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Energy > Oil & Gas > Upstream (0.45)
Government > Regional Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Greedy inference with layers of lazy maps

Bigoni, Daniele, Zahm, Olivier, Spantini, Alessio, Marzouk, Youssef

arXiv.org Machine LearningMay-31-2019

We propose a framework for the greedy approximation of high-dimensional Bayesian inference problems, through the composition of multiple \emph{low-dimensional} transport maps or flows. Our framework operates recursively on a sequence of ``residual'' distributions, given by pulling back the posterior through the previously computed transport maps. The action of each map is confined to a low-dimensional subspace that we identify by minimizing an error bound. At each step, our approach thus identifies (i) a relevant subspace of the residual distribution, and (ii) a low-dimensional transformation between a restriction of the residual onto this subspace and a standard Gaussian. We prove weak convergence of the approach to the posterior distribution, and we demonstrate the algorithm on a range of challenging inference problems in differential equations and spatial statistics.

algorithm, bayesian inference, survey article, (19 more...)

arXiv.org Machine Learning

1906.00031

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report (0.64)

Industry: Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)

Add feedback

A Stein variational Newton method

Detommaso, Gianluca, Cui, Tiangang, Marzouk, Youssef, Spantini, Alessio, Scheichl, Robert

Neural Information Processing SystemsDec-31-2018

Stein variational gradient descent (SVGD) was recently proposed as a general purpose nonparametric variational inference algorithm [Liu & Wang, NIPS 2016]: it minimizes the Kullback-Leibler divergence between the target distribution and its approximation by implementing a form of functional gradient descent on a reproducing kernel Hilbert space. In this paper, we accelerate and generalize the SVGD algorithm by including second-order information, thereby approximating a Newton-like iteration in function space. We also show how second-order information can lead to more effective choices of kernel. We observe significant computational gains over the original SVGD algorithm in multiple test cases.

algorithm, artificial intelligence, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts (0.14)
North America > Canada (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.72)

Add feedback

A Stein variational Newton method

Detommaso, Gianluca, Cui, Tiangang, Marzouk, Youssef, Spantini, Alessio, Scheichl, Robert

Neural Information Processing SystemsDec-31-2018

Stein variational gradient descent (SVGD) was recently proposed as a general purpose nonparametric variational inference algorithm: it minimizes the Kullback–Leibler divergence between the target distribution and its approximation by implementing a form of functional gradient descent on a reproducing kernel Hilbert space [Liu & Wang, NIPS 2016]. In this paper, we accelerate and generalize the SVGD algorithm by including second-order information, thereby approximating a Newton-like iteration in function space. We also show how second-order information can lead to more effective choices of kernel. We observe significant computational gains over the original SVGD algorithm in multiple test cases.

algorithm, artificial intelligence, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts (0.14)
North America > Canada (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.72)

Add feedback

A Stein variational Newton method

Detommaso, Gianluca, Cui, Tiangang, Marzouk, Youssef, Scheichl, Robert, Spantini, Alessio

arXiv.org Machine LearningJun-8-2018

Stein variational gradient descent (SVGD) was recently proposed as a general purpose nonparametric variational inference algorithm [Liu & Wang, NIPS 2016]: it minimizes the Kullback-Leibler divergence between the target distribution and its approximation by implementing a form of functional gradient descent on a reproducing kernel Hilbert space. In this paper, we accelerate and generalize the SVGD algorithm by including second-order information, thereby approximating a Newton-like iteration in function space. We also show how second-order information can lead to more effective choices of kernel. We observe significant computational gains over the original SVGD algorithm in multiple test cases.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

1806.03085

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.89)

Add feedback

Beyond normality: Learning sparse probabilistic graphical models in the non-Gaussian setting

Morrison, Rebecca, Baptista, Ricardo, Marzouk, Youssef

Neural Information Processing SystemsDec-31-2017

We present an algorithm to identify sparse dependence structure in continuous and non-Gaussian probability distributions, given a corresponding set of data. The conditional independence structure of an arbitrary distribution can be represented as an undirected graph (or Markov random field), but most algorithms for learning this structure are restricted to the discrete or Gaussian cases. Our new approach allows for more realistic and accurate descriptions of the distribution in question, and in turn better estimates of its sparse Markov structure. Sparsity in the graph is of interest as it can accelerate inference, improve sampling methods, and reveal important dependencies between variables. The algorithm relies on exploiting the connection between the sparsity of the graph and the sparsity of transport maps, which deterministically couple one probability measure to another.

artificial intelligence, graph, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Inference via low-dimensional couplings

Spantini, Alessio, Bigoni, Daniele, Marzouk, Youssef

arXiv.org Machine LearningNov-24-2017

We investigate the low-dimensional structure of deterministic transformations between random variables, i.e., transport maps between probability measures. In the context of statistics and machine learning, these transformations can be used to couple a tractable "reference" measure (e.g., a standard Gaussian) with a target measure of interest. Direct simulation from the desired measure can then be achieved by pushing forward reference samples through the map. Yet characterizing such a map---e.g., representing and evaluating it---grows challenging in high dimensions. The central contribution of this paper is to establish a link between the Markov properties of the target measure and the existence of low-dimensional couplings, induced by transport maps that are sparse and/or decomposable. Our analysis not only facilitates the construction of transformations in high-dimensional settings, but also suggests new inference methodologies for continuous non-Gaussian graphical models. For instance, in the context of nonlinear state-space models, we describe new variational algorithms for filtering, smoothing, and sequential parameter inference. These algorithms can be understood as the natural generalization---to the non-Gaussian case---of the square-root Rauch-Tung-Striebel Gaussian smoother.

optimization problem, survey article, transport map, (18 more...)

arXiv.org Machine Learning

1703.06131

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report (1.00)

Industry: Government > Regional Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Beyond normality: Learning sparse probabilistic graphical models in the non-Gaussian setting

Morrison, Rebecca E., Baptista, Ricardo, Marzouk, Youssef

arXiv.org Machine LearningNov-6-2017

We present an algorithm to identify sparse dependence structure in continuous and non-Gaussian probability distributions, given a corresponding set of data. The conditional independence structure of an arbitrary distribution can be represented as an undirected graph (or Markov random field), but most algorithms for learning this structure are restricted to the discrete or Gaussian cases. Our new approach allows for more realistic and accurate descriptions of the distribution in question, and in turn better estimates of its sparse Markov structure. Sparsity in the graph is of interest as it can accelerate inference, improve sampling methods, and reveal important dependencies between variables. The algorithm relies on exploiting the connection between the sparsity of the graph and the sparsity of transport maps, which deterministically couple one probability measure to another.

artificial intelligence, graph, machine learning, (18 more...)

arXiv.org Machine Learning

1711.0095

Country: North America > United States (0.46)

Genre: Research Report (0.64)

Technology: