Goto

Collaborating Authors

 Tannenbaum, Allen


Distributed Nonlinear Filtering using Triangular Transport Maps

arXiv.org Artificial Intelligence

One attractive instance of measure transport for Bayesian Multi-agent systems are commonplace in today's technological inference is through the approximation of the Knothe-landscape, and many problems that were once cast in Rosenblatt (KR) rearrangement [13], [14]. This transformation a centralized setting, have been recast in a distributed manner can be easily approximated given only samples of [1]. With the introduction of multiple agents, various considerations a distribution and has been applied for various higherdimensional must be made due to information flow, changes and nonlinear filtering problems [15], [16].


Optimal Transport for Kernel Gaussian Mixture Models

arXiv.org Machine Learning

The Wasserstein distance from optimal mass transport (OMT) is a powerful mathematical tool with numerous applications that provides a natural measure of the distance between two probability distributions. Several methods to incorporate OMT into widely used probabilistic models, such as Gaussian or Gaussian mixture, have been developed to enhance the capability of modeling complex multimodal densities of real datasets. However, very few studies have explored the OMT problems in a reproducing kernel Hilbert space (RKHS), wherein the kernel trick is utilized to avoid the need to explicitly map input data into a high-dimensional feature space. In the current study, we propose a Wasserstein-type metric to compute the distance between two Gaussian mixtures in a RKHS via the kernel trick, i.e., kernel Gaussian mixture models.


Data Assimilation for Sign-indefinite Priors: A generalization of Sinkhorn's algorithm

arXiv.org Machine Learning

The purpose of this work is to develop a framework to calibrate signed datasets so as to be consistent with specified marginals by suitably extending the Schr\"odinger-Fortet-Sinkhorn paradigm. Specifically, we seek to revise sign-indefinite multi-dimensional arrays in a way that the updated values agree with specified marginals. Our approach follows the rationale in Schr\"odinger's problem, aimed at updating a "prior" probability measure to agree with marginal distributions. The celebrated Sinkhorn's algorithm (established earlier by R.\ Fortet) that solves Schr\"odinger's problem found early applications in calibrating contingency tables in statistics and, more recently, multi-marginal problems in machine learning and optimal transport. Herein, we postulate a sign-indefinite prior in the form of a multi-dimensional array, and propose an optimization problem to suitably update this prior to ensure consistency with given marginals. The resulting algorithm generalizes the Sinkhorn algorithm in that it amounts to iterative scaling of the entries of the array along different coordinate directions. The scaling is multiplicative but also, in contrast to Sinkhorn, inverse-multiplicative depending on the sign of the entries. Our algorithm reduces to the classical Sinkhorn algorithm when the entries of the prior are positive.


Promotion/Inhibition Effects in Networks: A Model with Negative Probabilities

arXiv.org Artificial Intelligence

Biological networks often encapsulate promotion/inhibition as signed edge-weights of a graph. Nodes may correspond to genes assigned expression levels (mass) of respective proteins. The promotion/inhibition nature of co-expression between nodes is encoded in the sign of the corresponding entry of a sign-indefinite adjacency matrix, though the strength of such co-expression (i.e., the precise value of edge weights) cannot typically be directly measured. Herein we address the inverse problem to determine network edge-weights based on a sign-indefinite adjacency and expression levels at the nodes. While our motivation originates in gene networks, the framework applies to networks where promotion/inhibition dictates a stationary mass distribution at the nodes. In order to identify suitable edge-weights we adopt a framework of ``negative probabilities,'' advocated by P.\ Dirac and R.\ Feynman, and we set up a likelihood formalism to obtain values for the sought edge-weights. The proposed optimization problem can be solved via a generalization of the well-known Sinkhorn algorithm; in our setting the Sinkhorn-type ``diagonal scalings'' are multiplicative or inverse-multiplicative, depending on the sign of the respective entries in the adjacency matrix, with value computed as the positive root of a quadratic polynomial.


Optimal transport for vector Gaussian mixture models

arXiv.org Machine Learning

Finite mixture models can describe a wide range of statistical phenomena. They have been successfully applied to numerous fields including biology, economics, engineering, and social sciences [15]. The first major use and analysis of mixture models is perhaps due to the mathematician and biostatistician Karl Pearson over 120 years ago who explicitly decomposed a distribution into two normal distributions for characterizing non-normal attributes of forehead to body length ratios in female shore crab populations [16]. The literature on analyzing and applying mixture models is growing due to their simplicity, versatility and flexibility. One of the most commonly used mixture models is the Gaussian mixture model (GMM), which is a weighted sum of Gaussian distributions.


Kernel Wasserstein Distance

arXiv.org Machine Learning

The Wasserstein distance is a powerful metric based on the theory of optimal transport. It gives a natural measure of the distance between two distributions with a wide range of applications. In contrast to a number of the common divergences on distributions such as Kullback-Leibler or Jensen-Shannon, it is (weakly) continuous, and thus ideal for analyzing corrupted data. To date, however, no kernel methods for dealing with nonlinear data have been proposed via the Wasserstein distance. In this work, we develop a novel method to compute the L2-Wasserstein distance in a kernel space implemented using the kernel trick. The latter is a general method in machine learning employed to handle data in a nonlinear manner. We evaluate the proposed approach in identifying computerized tomography (CT) slices with dental artifacts in head and neck cancer, performing unsupervised hierarchical clustering on the resulting Wasserstein distance matrix that is computed on imaging texture features extracted from each CT slice. Our experiments show that the kernel approach outperforms classical non-kernel approaches in identifying CT slices with artifacts.