transportation distance
Sinkhorn Distances: Lightspeed Computation of Optimal Transport
Optimal transportation distances are a fundamental family of parameterized distances for histograms in the probability simplex. Despite their appealing theoretical properties, excellent performance and intuitive formulation, their computation involves the resolution of a linear program whose cost is prohibitive whenever the histograms' dimension exceeds a few hundreds. We propose in this work a new family of optimal transportation distances that look at transportation problems from a maximum-entropy perspective. We smooth the classical optimal transportation problem with an entropic regularization term, and show that the resulting optimum is also a distance which can be computed through Sinkhorn's matrix scaling algorithm at a speed that is several orders of magnitude faster than that of transportation solvers. We also report improved performance on the MNIST benchmark problem over competing distances.
Reviews: Lower Bounds on Adversarial Robustness from Optimal Transport
The paper proposes a classifier-independent lower bound for binary classification in the adversarial setting. More precisely, Theorem 1 connects the "Bayes optimal" adversarial robustness error to a notion of separability, that is the transportation distance between the positive and negative points in the feature space, induced by moving points around according to the attack model (i.e the constraints on the attacker). The idea of making use of the Kantorovich-Rubinstein transportation distance (also known as Wasserstein distance) to increase robustness is in the air presently, this paper show how it can be used. It is interesting to also point out that the authors also show that their lower bound can be efficiently computed by convex optimization. The contribution is clearly related to learning theory, but also have interesting empirical validation. The paper is very well written and organised, containing conceptual examples related to multivariate Gaussians, for which concrete computations are done.
Identification of Non-causal Graphical Models
The paper considers the problem to estimate non-causal graphical models whose edges encode smoothing relations among the variables. We propose a new covariance extension problem and show that the solution minimizing the transportation distance with respect to white noise process is a double-sided autoregressive non-causal graphical model. Then, we generalize the paradigm to a class of graphical autoregressive moving-average models. Finally, we test the performance of the proposed method through some numerical experiments.
Identification of Recurrent Patterns in the Activation of Brain Networks
Identifying patterns from the neuroimaging recordings of brain activity related to the unobservable psychological or mental state of an individual can be treated as a unsupervised pattern recognition problem. The main challenges, however, for such an analysis of fMRI data are: a) defining a physiologically meaningful feature-space for representing the spatial patterns across time; b) dealing with the high-dimensionality of the data; and c) robustness to the various artifacts and confounds in the fMRI time-series. In this paper, we present a network-aware feature-space to represent the states of a general network, that enables comparing and clustering such states in a manner that is a) meaningful in terms of the network connectivity structure; b)computationally efficient; c) low-dimensional; and d) relatively robust to structured and random noise artifacts. This feature-space is obtained from a spherical relaxation of the transportation distance metric which measures the cost of transporting "mass" over the network to transform one function into another. Through theoretical and empirical assessments, we demonstrate the accuracy and efficiency of the approximation, especially for large problems.
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Health & Medicine > Health Care Technology (1.00)
Fast Dual Subgradient Optimization of the Integrated Transportation Distance Between Stochastic Kernels
Lin, Zhengqi, Ruszczynski, Andrzej
A generalization of the Wasserstein metric, the integrated transportation distance, establishes a novel distance between probability kernels of Markov systems. This metric serves as the foundation for an efficient approximation technique, enabling the replacement of the original system's kernel with a kernel with a discrete support of limited cardinality. To facilitate practical implementation, we present a specialized dual algorithm capable of constructing these approximate kernels quickly and efficiently, without requiring computationally expensive matrix operations. Finally, we demonstrate the efficacy of our method through several illustrative examples, showcasing its utility in practical scenarios. This advancement offers new possibilities for the streamlined analysis and manipulation of stochastic systems represented by kernels.
- North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > China (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.47)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.46)
Efficient Algorithms for Sparse Moment Problems without Separation
We consider the sparse moment problem of learning a $k$-spike mixture in high-dimensional space from its noisy moment information in any dimension. We measure the accuracy of the learned mixtures using transportation distance. Previous algorithms either assume certain separation assumptions, use more recovery moments, or run in (super) exponential time. Our algorithm for the one-dimensional problem (also called the sparse Hausdorff moment problem) is a robust version of the classic Prony's method, and our contribution mainly lies in the analysis. We adopt a global and much tighter analysis than previous work (which analyzes the perturbation of the intermediate results of Prony's method). A useful technical ingredient is a connection between the linear system defined by the Vandermonde matrix and the Schur polynomial, which allows us to provide tight perturbation bound independent of the separation and may be useful in other contexts. To tackle the high-dimensional problem, we first solve the two-dimensional problem by extending the one-dimensional algorithm and analysis to complex numbers. Our algorithm for the high-dimensional case determines the coordinates of each spike by aligning a 1d projection of the mixture to a random vector and a set of 2d projections of the mixture. Our results have applications to learning topic models and Gaussian mixtures, implying improved sample complexity results or running time over prior work.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Virginia (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > China (0.04)
New Algorithms And Fast Implementations To Approximate Stochastic Processes
Kirui, Kipngeno Benard, Pflug, Georg Ch., Pichler, Alois
We present new algorithms and fast implementations to find efficient approximations for modelling stochastic processes. For many numerical computations it is essential to develop finite approximations for stochastic processes. While the goal is always to find a finite model, which represents a given knowledge about the real data process as accurate as possible, the ways of estimating the discrete approximating model may be quite different: (i) if the stochastic model is known as a solution of a stochastic differential equation, e.g., one may generate the scenario tree directly from the specified model; (ii) if a simulation algorithm is available, which allows simulating trajectories from all conditional distributions, a scenario tree can be generated by stochastic approximation; (iii) if only some observed trajectories of the scenario process are available, the construction of the approximating process can be based on non-parametric conditional density estimates.
- North America > United States > New York (0.04)
- Europe > Germany (0.04)
- North America > United States > Rhode Island > Providence County > Providence (0.04)
- Europe > Austria > Vienna (0.04)
- Energy (1.00)
- Banking & Finance > Trading (1.00)
Sinkhorn Distances: Lightspeed Computation of Optimal Transport
Optimal transportation distances are a fundamental family of parameterized distances for histograms in the probability simplex. Despite their appealing theoretical properties, excellent performance and intuitive formulation, their computation involves the resolution of a linear program whose cost is prohibitive whenever the histograms' dimension exceeds a few hundreds. We propose in this work a new family of optimal transportation distances that look at transportation problems from a maximum-entropy perspective. We smooth the classical optimal transportation problem with an entropic regularization term, and show that the resulting optimum is also a distance which can be computed through Sinkhorn's matrix scaling algorithm at a speed that is several orders of magnitude faster than that of transportation solvers. We also report improved performance on the MNIST benchmark problem over competing distances.
Probabilistic Multilevel Clustering via Composite Transportation Distance
Ho, Nhat, Huynh, Viet, Phung, Dinh, Jordan, Michael I.
Clustering is a classic and fundamental problem in machine learning. Popular clustering methods such as K-means and the EM algorithm have been the workhorses of exploratory data analysis. However, the underlying model for such methods is a simple flat partition or a mixture model, which do not capture multilevel structures (e.g., words are grouped into documents, documents are grouped into corpora) that arise in many applications in the physical, biological or cognitive sciences. The clustering of multilevel structured data calls for novel methodologies beyond classical clustering. One natural approach for capturing multilevel structures is to use a hierarchy in which data are clustered locally into groups, and those groups are partitioned in a "global clustering." Attempts to develop algorithms of this kind can be roughly classified into two categories. The first category makes use of probabilistic models, often based on Dirichlet process priors.
- Asia > Middle East > Jordan (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (3 more...)
Sinkhorn Distances: Lightspeed Computation of Optimal Transportation Distances
Optimal transportation distances are a fundamental family of parameterized distances for histograms. Despite their appealing theoretical properties, excellent performance in retrieval tasks and intuitive formulation, their computation involves the resolution of a linear program whose cost is prohibitive whenever the histograms' dimension exceeds a few hundreds. We propose in this work a new family of optimal transportation distances that look at transportation problems from a maximum-entropy perspective. We smooth the classical optimal transportation problem with an entropic regularization term, and show that the resulting optimum is also a distance which can be computed through Sinkhorn-Knopp's matrix scaling algorithm at a speed that is several orders of magnitude faster than that of transportation solvers. We also report improved performance over classical optimal transportation distances on the MNIST benchmark problem.
- Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)