Genre
Bayesian Nonparametric Hidden Semi-Markov Models
Johnson, Matthew J., Willsky, Alan S.
There is much interest in the Hierarchical Dirichlet Process Hidden Markov Model (HDP-HMM) as a natural Bayesian nonparametric extension of the ubiquitous Hidden Markov Model for learning from sequential and time-series data. However, in many settings the HDP-HMM's strict Markovian constraints are undesirable, particularly if we wish to learn or encode non-geometric state durations. We can extend the HDP-HMM to capture such structure by drawing upon explicit-duration semi-Markov modeling, which has been developed mainly in the parametric non-Bayesian setting, to allow construction of highly interpretable models that admit natural prior information on state durations. In this paper we introduce the explicit-duration Hierarchical Dirichlet Process Hidden semi-Markov Model (HDP-HSMM) and develop sampling algorithms for efficient posterior inference. The methods we introduce also provide new methods for sampling inference in the finite Bayesian HSMM. Our modular Gibbs sampling methods can be embedded in samplers for larger hierarchical Bayesian models, adding semi-Markov chain modeling as another tool in the Bayesian inference toolbox. We demonstrate the utility of the HDP-HSMM and our inference methods on both synthetic and real experiments.
Estimating Densities with Non-Parametric Exponential Families
Yuan, Lin, Kirshner, Sergey, Givan, Robert
We propose a novel approach for density estimation with exponential families for the case when the true density may not fall within the chosen family. Our approach augments the sufficient statistics with features designed to accumulate probability mass in the neighborhood of the observed points, resulting in a non-parametric model similar to kernel density estimators. We show that under mild conditions, the resulting model uses only the sufficient statistics if the density is within the chosen exponential family, and asymptotically, it approximates densities outside of the chosen exponential family. Using the proposed approach, we modify the exponential random graph model, commonly used for modeling small-size graph distributions, to address the well-known issue of model degeneracy.
Distance Optimal Formation Control on Graphs with a Tight Convergence Time Guarantee
Yu, Jingjin, LaValle, Steven M.
In this paper, we study the problem of controlling a group of indistinguishable agents with non-negligible sizes to take arbitrary desired formations. The agents, confined to an arbitrary connected graph, are capable of moving from one vertex to an adjacent vertex in one time step. The control policy must ensure that no collisions occur, which may happen when two agents attempt to move to the same vertex or move along the same edge. Counting each edge as having unit distance, we show that a (centralized) policy/schedule exists that moves the agents to the desired formation along paths having shortest total distance. The control policy also guarantees that a convergence time (the time when the formation is complete) of no more than n l 1, in which n is the number of agents,lis the maximum (shortest) distance between any two initial and goal vertices.
Optimal measures and Markov transition kernels
We study optimal solutions to an abstract optimization problem for measures, which is a generalization of classical variational problems in information theory and statistical physics. In the classical problems, information and relative entropy are defined using the Kullback-Leibler divergence, and for this reason optimal measures belong to a one-parameter exponential family. Measures within such a family have the property of mutual absolute continuity. Here we show that this property characterizes other families of optimal positive measures if a functional representing information has a strictly convex dual. Mutual absolute continuity of optimal probability measures allows us to strictly separate deterministic and non-deterministic Markov transition kernels, which play an important role in theories of decisions, estimation, control, communication and computation. We show that deterministic transitions are strictly sub-optimal, unless information resource with a strictly convex dual is unconstrained. For illustration, we construct an example where, unlike non-deterministic, any deterministic kernel either has negatively infinite expected utility (unbounded expected error) or communicates infinite information.
Restricting exchangeable nonparametric distributions
Williamson, Sinead, Ghahramani, Zoubin, MacEachern, Steven N., Xing, Eric P.
Distributions over exchangeable matrices with infinitely many columns, such as the Indian buffet process, are useful in constructing nonparametric latent variable models. However, the distribution implied by such models over the number of features exhibited by each data point may be poorly- suited for many modeling tasks. In this paper, we propose a class of exchangeable nonparametric priors obtained by restricting the domain of existing models. Such models allow us to specify the distribution over the number of features per data point, and can achieve better performance on data sets where the number of features is not well-modeled by the original distribution.
Learning Probability Measures with respect to Optimal Transport Metrics
Canas, Guillermo D., Rosasco, Lorenzo
We study the problem of estimating, in the sense of optimal transport metrics, a measure which is assumed supported on a manifold embedded in a Hilbert space. By establishing a precise connection between optimal transport metrics, optimal quantization, and learning theory, we derive new probabilistic bounds for the performance of a classic algorithm in unsupervised learning (k-means), when used to produce a probability measure derived from the data. In the course of the analysis, we arrive at new lower bounds, as well as probabilistic upper bounds on the convergence rate of the empirical law of large numbers, which, unlike existing bounds, are applicable to a wide class of measures.
Statistically adaptive learning for a general class of cost functions (SA L-BFGS)
Purpura, Stephen, Hillard, Dustin, Hubenthal, Mark, Walsh, Jim, Golder, Scott, Smith, Scott
We present a system that enables rapid model experimentation for tera-scale machine learning with trillions of non-zero features, billions of training examples, and millions of parameters. Our contribution to the literature is a new method (SA L-BFGS) for changing batch L-BFGS to perform in near real-time by using statistical tools to balance the contributions of previous weights, old training examples, and new training examples to achieve fast convergence with few iterations. The result is, to our knowledge, the most scalable and flexible linear learning system reported in the literature, beating standard practice with the current best system (Vowpal Wabbit and AllReduce). Using the KDD Cup 2012 data set from Tencent, Inc. we provide experimental results to verify the performance of this method.
A Method of Moments for Mixture Models and Hidden Markov Models
Anandkumar, Animashree, Hsu, Daniel, Kakade, Sham M.
Mixture models are a fundamental tool in applied statistics and machine learning for treating data taken from multiple subpopulations. The current practice for estimating the parameters of such models relies on local search heuristics (e.g., the EM algorithm) which are prone to failure, and existing consistent methods are unfavorable due to their high computational and sample complexity which typically scale exponentially with the number of mixture components. This work develops an efficient method of moments approach to parameter estimation for a broad class of high-dimensional mixture models with many components, including multi-view mixtures of Gaussians (such as mixtures of axis-aligned Gaussians) and hidden Markov models. The new method leads to rigorous unsupervised learning results for mixture models that were not achieved by previous works; and, because of its simplicity, it offers a viable alternative to EM for practical deployment.
Sparse Reward Processes
We introduce a class of learning problems where the agent is presented with a series of tasks. Intuitively, if there is relation among those tasks, then the information gained during execution of one task has value for the execution of another task. Consequently, the agent is intrinsically motivated to explore its environment beyond the degree necessary to solve the current task it has at hand. We develop a decision theoretic setting that generalises standard reinforcement learning tasks and captures this intuition. More precisely, we consider a multi-stage stochastic game between a learning agent and an opponent. We posit that the setting is a good model for the problem of life-long learning in uncertain environments, where while resources must be spent learning about currently important tasks, there is also the need to allocate effort towards learning about aspects of the world which are not relevant at the moment. This is due to the fact that unpredictable future events may lead to a change of priorities for the decision maker. Thus, in some sense, the model "explains" the necessity of curiosity. Apart from introducing the general formalism, the paper provides algorithms. These are evaluated experimentally in some exemplary domains. In addition, performance bounds are proven for some cases of this problem.
Contextually Guided Semantic Labeling and Search for 3D Point Clouds
Anand, Abhishek, Koppula, Hema Swetha, Joachims, Thorsten, Saxena, Ashutosh
RGB-D cameras, which give an RGB image to- gether with depths, are becoming increasingly popular for robotic perception. In this paper, we address the task of detecting commonly found objects in the 3D point cloud of indoor scenes obtained from such cameras. Our method uses a graphical model that captures various features and contextual relations, including the local visual appearance and shape cues, object co-occurence relationships and geometric relationships. With a large number of object classes and relations, the model's parsimony becomes important and we address that by using multiple types of edge potentials. We train the model using a maximum-margin learning approach. In our experiments over a total of 52 3D scenes of homes and offices (composed from about 550 views), we get a performance of 84.06% and 73.38% in labeling office and home scenes respectively for 17 object classes each. We also present a method for a robot to search for an object using the learned model and the contextual information available from the current labelings of the scene. We applied this algorithm successfully on a mobile robot for the task of finding 12 object classes in 10 different offices and achieved a precision of 97.56% with 78.43% recall.