canonical path
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > California > Santa Clara County > Mountain View (0.04)
- North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
- (5 more...)
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > California > Santa Clara County > Mountain View (0.04)
- North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
- (5 more...)
Sampling from Probabilistic Submodular Models
Alkis Gotovos, Hamed Hassani, Andreas Krause
Submodular and supermodular functions have found wide applicability in machine learning, capturing notions such as diversity and regularity, respectively. These notions have deep consequences for optimization, and the problem of (approximately) optimizing submodular functions has received much attention. However, beyond optimization, these notions allow specifying expressive probabilistic models that can be used to quantify predictive uncertainty via marginal inference. Prominent, well-studied special cases include Ising models and determinan-tal point processes, but the general class of log-submodular and log-supermodular models is much richer and little studied. In this paper, we investigate the use of Markov chain Monte Carlo sampling to perform approximate inference in general log-submodular and log-supermodular models. In particular, we consider a simple Gibbs sampling procedure, and establish two sufficient conditions, the first guaranteeing polynomial-time, and the second fast ( O ( n log n)) mixing. We also evaluate the efficiency of the Gibbs sampler on three examples of such models, and compare against a recently proposed variational approach.
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.36)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.35)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.35)
Export Reviews, Discussions, Author Feedback and Meta-Reviews
We would like to thank all reviewers for their helpful feedback. Assigned_Reviewer_1 ------------------- We agree that investigating empirical convergence behavior as a function of problem size is an interesting direction for future work. This is, however, a difficult undertaking, since (1) selecting a suitable class of instances that can naturally be scaled in terms of size (n) is not always straightforward, and (2) diagnosing convergence of a MCMC algorithm in practice is known to be notoriously difficult. That is why, in this work, we elected to evaluate, and compare with the variational baseline, the accuracy of the sampler on previously-proposed practical instances. Assigned_Reviewer_2 ------------------- * Similarly to other probabilistic models, having a probabilistic facility location model allows us to quantify uncertainty about conclusions we draw using the model, and provides a principled way for learning such a model from data, with applications including document [1] and image [2] summarization.
Sampling from Probabilistic Submodular Models
Submodular and supermodular functions have found wide applicability in machine learning, capturing notions such as diversity and regularity, respectively. These notions have deep consequences for optimization, and the problem of (approximately) optimizing submodular functions has received much attention. However, beyond optimization, these notions allow specifying expressive probabilistic models that can be used to quantify predictive uncertainty via marginal inference. Prominent, well-studied special cases include Ising models and determinantal point processes, but the general class of log-submodular and log-supermodular models is much richer and little studied. In this paper, we investigate the use of Markov chain Monte Carlo sampling to perform approximate inference in general log-submodular and log-supermodular models. In particular, we consider a simple Gibbs sampling procedure, and establish two sufficient conditions, the first guaranteeing polynomial-time, and the second fast (O(n log n)) mixing. We also evaluate the efficiency of the Gibbs sampler on three examples of such models, and compare against a recently proposed variational approach.
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.36)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.35)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.35)
On Mixing Rates for Bayesian CART
Kim, Jungeum, Rockova, Veronika
The success of Bayesian inference with MCMC depends critically on Markov chains rapidly reaching the posterior distribution. Despite the plentitude of inferential theory for posteriors in Bayesian non-parametrics, convergence properties of MCMC algorithms that simulate from such ideal inferential targets are not thoroughly understood. This work focuses on the Bayesian CART algorithm which forms a building block of Bayesian Additive Regression Trees (BART). We derive upper bounds on mixing times for typical posteriors under various proposal distributions. Exploiting the wavelet representation of trees, we provide sufficient conditions for Bayesian CART to mix well (polynomially) under certain hierarchical connectivity restrictions on the signal. We also derive a negative result showing that Bayesian CART (based on simple grow and prune steps) cannot reach deep isolated signals in faster than exponential mixing time. To remediate myopic tree exploration, we propose Twiggy Bayesian CART which attaches/detaches entire twigs (not just single nodes) in the proposal distribution. We show polynomial mixing of Twiggy Bayesian CART without assuming that the signal is connected on a tree. Going further, we show that informed variants achieve even faster mixing. A thorough simulation study highlights discrepancies between spike-and-slab priors and Bayesian CART under a variety of proposals.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Asia > Middle East > Jordan (0.04)
The 2^k Neighborhoods for Grid Path Planning
Rivera, Nicolás (University of Cambridge) | Hernández, Carlos (Universidad Andrés Bello) | Hormazábal, Nicolás (Universidad Andrés Bello) | Baier, Jorge A (Pontificia Universidad Católica de Chile)
Grid path planning is an important problem in AI. Its understanding has been key for the development of autonomous navigation systems. An interesting and rather surprising fact about the vast literature on this problem is that only a few neighborhoods have been used when evaluating these algorithms. Indeed, only the 4- and 8-neighborhoods are usually considered, and rarely the 16-neighborhood. This paper describes three contributions that enable the construction of effective grid path planners for extended 2k-neighborhoods; that is, neighborhoods that admit 2k neighbors per state, where k is a parameter. First, we provide a simple recursive definition of the 2k-neighborhood in terms of the 2k-1-neighborhood. Second, we derive distance functions, for any k ≥ 2, which allow us to propose admissible heuristics that are perfect for obstacle-free grids, which generalize the well-known Manhattan and Octile distances. Third, we define the notion of canonical path for the 2k-neighborhood; this allows us to incorporate our neighborhoods into two versions of A*, namely Canonical A* and Jump Point Search (JPS), whose performance, we show, scales well when increasing k. Our empirical evaluation shows that, when increasing k, the cost of the solution found improves substantially. Used with the 2k-neighborhood, Canonical A* and JPS, in many configurations, are also superior to the any-angle path planner Theta* both in terms of solution quality and runtime. Our planner is competitive with one implementation of the any-angle path planner, ANYA in some configurations. Our main practical conclusion is that standard, well-understood grid path planning technology may provide an effective approach to any-angle grid path planning.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- (7 more...)
Fast mixing for Latent Dirichlet allocation
Markov chain Monte Carlo (MCMC) algorithms are ubiquitous in probability theory in general and in machine learning in particular. A Markov chain is devised so that its stationary distribution is some probability distribution of interest. Then one samples from the given distribution by running the Markov chain for a "long time" until it appears to be stationary and then collects the sample. However these chains are often very complex and there are no theoretical guarantees that stationarity is actually reached. In this paper we study the Gibbs sampler of the posterior distribution of a very simple case of Latent Dirichlet Allocation, the arguably most well known Bayesian unsupervised learning model for text generation and text classification. It is shown that when the corpus consists of two long documents of equal length $m$ and the vocabulary consists of only two different words, the mixing time is at most of order $m^2\log m$ (which corresponds to $m\log m$ rounds over the corpus). It will be apparent from our analysis that it seems very likely that the mixing time is not much worse in the more relevant case when the number of documents and the size of the vocabulary are also large as long as each word is represented a large number in each document, even though the computations involved may be intractable.
- Europe > Sweden > Vaestra Goetaland > Gothenburg (0.04)
- Asia > Middle East > Jordan (0.04)
Sampling from Probabilistic Submodular Models
Gotovos, Alkis, Hassani, Hamed, Krause, Andreas
Submodular and supermodular functions have found wide applicability in machine learning, capturing notions such as diversity and regularity, respectively. These notions have deep consequences for optimization, and the problem of (approximately) optimizing submodular functions has received much attention. However, beyond optimization, these notions allow specifying expressive probabilistic models that can be used to quantify predictive uncertainty via marginal inference. Prominent, well-studied special cases include Ising models and determinantal point processes, but the general class of log-submodular and log-supermodular models is much richer and little studied. In this paper, we investigate the use of Markov chain Monte Carlo sampling to perform approximate inference in general log-submodular and log-supermodular models. In particular, we consider a simple Gibbs sampling procedure, and establish two sufficient conditions, the first guaranteeing polynomial-time, and the second fast (O(nlogn)) mixing. We also evaluate the efficiency of the Gibbs sampler on three examples of such models, and compare against a recently proposed variational approach.
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.36)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.35)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.35)
On the Computational Complexity of High-Dimensional Bayesian Variable Selection
Yang, Yun, Wainwright, Martin J., Jordan, Michael I.
We study the computational complexity of Markov chain Monte Carlo (MCMC) methods for high-dimensional Bayesian linear regression under sparsity constraints. We first show that a Bayesian approach can achieve variable-selection consistency under relatively mild conditions on the design matrix. We then demonstrate that the statistical criterion of posterior concentration need not imply the computational desideratum of rapid mixing of the MCMC algorithm. By introducing a truncated sparsity prior for variable selection, we provide a set of conditions that guarantee both variable-selection consistency and rapid mixing of a particular Metropolis-Hastings algorithm. The mixing time is linear in the number of covariates up to a logarithmic factor. Our proof controls the spectral gap of the Markov chain by constructing a canonical path ensemble that is inspired by the steps taken by greedy algorithms for variable selection.
- North America > United States > Rhode Island > Providence County > Providence (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.88)