computed efficiently
Bisimulation Metrics are Optimal Transport Distances, and Can be Computed Efficiently
We propose a new framework for formulating optimal transport distances between Markov chains. Previously known formulations studied couplings between the entire joint distribution induced by the chains, and derived solutions via a reduction to dynamic programming (DP) in an appropriately defined Markov decision process. This formulation has, however, not led to particularly efficient algorithms so far, since computing the associated DP operators requires fully solving a static optimal transport problem, and these operators need to be applied numerous times during the overall optimization process. In this work, we develop an alternative perspective by considering couplings between a flattened'' version of the joint distributions that we call discounted occupancy couplings, and show that calculating optimal transport distances in the full space of joint distributions can be equivalently formulated as solving a linear program (LP) in this reduced space. This LP formulation formulation allows us to port several algorithmic ideas from other areas of optimal transport theory.
Injective Flows for parametric hypersurfaces
Negri, Marcello Massimo, Aellen, Jonathan, Roth, Volker
Normalizing Flows (NFs) are powerful and efficient models for density estimation. When modeling densities on manifolds, NFs can be generalized to injective flows but the Jacobian determinant becomes computationally prohibitive. Current approaches either consider bounds on the log-likelihood or rely on some approximations of the Jacobian determinant. In contrast, we propose injective flows for parametric hypersurfaces and show that for such manifolds we can compute the Jacobian determinant exactly and efficiently, with the same cost as NFs. Furthermore, we show that for the subclass of star-like manifolds we can extend the proposed framework to always allow for a Cartesian representation of the density. We showcase the relevance of modeling densities on hypersurfaces in two settings. Firstly, we introduce a novel Objective Bayesian approach to penalized likelihood models by interpreting level-sets of the penalty as star-like manifolds. Secondly, we consider Bayesian mixture models and introduce a general method for variational inference by defining the posterior of mixture weights on the probability simplex.
- Europe > Switzerland > Basel-City > Basel (0.04)
- North America > United States > California (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.66)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)
Learning Efficient Random Maximum A-Posteriori Predictors with Non-Decomposable Loss Functions
In this work we develop efficient methods for learning random MAP predictors for structured label problems. In particular, we construct posterior distributions over perturbations that can be adjusted via stochastic gradient methods. We show that any smooth posterior distribution would suffice to define a smooth PAC-Bayesian risk bound suitable for gradient methods. In addition, we relate the posterior distributions to computational properties of the MAP predictors. We suggest multiplicative posteriors to learn super-modular potential functions that accompany specialized MAP predictors such as graph-cuts. We also describe label-augmented posterior models that can use efficient MAP approximations, such as those arising from linear program relaxations.
- North America > United States > New York (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Asia > Middle East > Israel > Haifa District > Haifa (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.49)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.47)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)
Learning Efficient Random Maximum A-Posteriori Predictors with Non-Decomposable Loss Functions
Hazan, Tamir, Maji, Subhransu, Keshet, Joseph, Jaakkola, Tommi
In this work we develop efficient methods for learning random MAP predictors for structured label problems. In particular, we construct posterior distributions over perturbations that can be adjusted via stochastic gradient methods. We show that every smooth posterior distribution would suffice to define a smooth PAC-Bayesian risk bound suitable for gradient methods. In addition, we relate the posterior distributions to computational properties of the MAP predictors. We suggest multiplicative posteriors to learn super-modular potential functions that accompany specialized MAP predictors such as graph-cuts. We also describe label-augmented posterior models that can use efficient MAP approximations, such as those arising from linear program relaxations.
- North America > United States > New York (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Asia > Middle East > Israel > Haifa District > Haifa (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.49)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.47)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)
Coding Time-Varying Signals Using Sparse, Shift-Invariant Representations
Lewicki, Michael S., Sejnowski, Terrence J.
A common way to represent a time series is to divide it into shortduration blocks, each of which is then represented by a set of basis functions. A limitation of this approach, however, is that the temporal alignment of the basis functions with the underlying structure in the time series is arbitrary. We present an algorithm for encoding a time series that does not require blocking the data. The algorithm finds an efficient representation by inferring the best temporal positions for functions in a kernel basis. These can have arbitrary temporal extent and are not constrained to be orthogonal.
- North America > United States > California > Santa Clara County > Stanford (0.04)
- North America > United States > California > San Diego County > La Jolla (0.04)
Coding Time-Varying Signals Using Sparse, Shift-Invariant Representations
Lewicki, Michael S., Sejnowski, Terrence J.
A common way to represent a time series is to divide it into shortduration blocks, each of which is then represented by a set of basis functions. A limitation of this approach, however, is that the temporal alignment of the basis functions with the underlying structure in the time series is arbitrary. We present an algorithm for encoding a time series that does not require blocking the data. The algorithm finds an efficient representation by inferring the best temporal positions for functions in a kernel basis. These can have arbitrary temporal extent and are not constrained to be orthogonal.
- North America > United States > California > Santa Clara County > Stanford (0.04)
- North America > United States > California > San Diego County > La Jolla (0.04)
Coding Time-Varying Signals Using Sparse, Shift-Invariant Representations
Lewicki, Michael S., Sejnowski, Terrence J.
A common way to represent a time series is to divide it into shortduration blocks,each of which is then represented by a set of basis functions. A limitation of this approach, however, is that the temporal alignmentof the basis functions with the underlying structure in the time series is arbitrary. We present an algorithm for encoding a time series that does not require blocking the data. The algorithm finds an efficient representation by inferring the best temporal positions forfunctions in a kernel basis. These can have arbitrary temporal extent and are not constrained to be orthogonal.
- North America > United States > California > Santa Clara County > Stanford (0.04)
- North America > United States > California > San Diego County > La Jolla (0.04)