Goto

Collaborating Authors

 Undirected Networks


Stochastic Annealing for Variational Inference

arXiv.org Machine Learning

Machine learning has produced a wide variety of useful tools for addressing a number of practical problems, often for those which involve large-scale datasets. Indeed, a number of disciplines ranging from recommender systems to bioinformatics rely on machine intelligence to extract useful information from their datasets in an efficient manner. One of the core machine learning approaches to such tasks is to define a prior over a model on data and infer the model parameters through posterior inference (Blei, 2014). The gold-standard in this direction is Markov chain Monte Carlo (MCMC), which gives a means for collecting samples from this posterior distribution in an asymptotically correct way (Robert & Casella, 2004). A frequent criticism of MCMC is that it is not scalable to large data sets--though recent work has begun to address this (e.g., Welling & Teh (2011); Maclaurin & Adams (2014)).


PLEASE: Palm Leaf Search for POMDPs with Large Observation Spaces

AAAI Conferences

This paper provides a novel POMDP planning method, called Palm LEAf SEarch (PLEASE), which allows the selection of more than one outcome when their potential impacts are close to the highest one during its forward exploration. Compared with existing trial-based algorithms, PLEASE can save considerable time to propagate the bound improvements of beliefs in deep levels of the search tree to the root belief because of fewer backup operations. Experiments showed that PLEASE scales up SARSOP, one of the fastest algorithms, by orders of magnitude on some POMDP tasks with large observation spaces.


Maximum a Posteriori Estimation by Search in Probabilistic Programs

AAAI Conferences

We introduce an approximate search algorithm for fast maximum a posteriori probability estimation in probabilistic programs, which we call Bayesian ascent Monte Carlo (BaMC). Probabilistic programs represent probabilistic models with varying number of mutually dependent finite, countable, and continuous random variables. BaMC is an anytime MAP search algorithm applicable to any combination of random variables and dependencies. We compare BaMC to other MAP estimation algorithms and show that BaMC is faster and more robust on a range of probabilistic models.


Confidence Backup Updates for Aggregating MDP State Values in Monte-Carlo Tree Search

AAAI Conferences

Monte-Carlo Tree Search (MCTS) algorithms estimate the value of MDP states based on rewards received by performing multiple random simulations. MCTS algorithms can use different strategies to aggregate these rewards and provide an estimation for the statesโ€™ values. The most common aggregation method is to store the mean reward of all simulations. Another common approach stores the best observed reward from each state. Both of these methods have complementary benefits and drawbacks. In this paper, we show that both of these methods are biased estimators for the real expected value of MDP states. We propose an hybrid approach that uses the best reward for states with low noise, and otherwise uses the mean. Experimental results on the Sailing MDP domain show that our method has a considerable advantage when the rewards are drawn from a noisy distribution.


Search Problems in the Domain of Multiplication: Case Study on Anomaly Detection Using Markov Chains

AAAI Conferences

Most work in heuristic search focused on path finding problems in which the cost of a path in the state space is the sum of its edges' weights. This paper addresses a different class of path finding problems in which the cost of a path is the product of its weights. We present reductions from different classes of multiplicative path finding problems to suitable classes of additive path finding problems. As a case study, we consider the problem of finding least and most probable paths in a Markov Chain, where path cost corresponds to the probability of traversing it. The importance of this problem is demonstrated in an anomaly detection application for cyberspace security. Three novel anomaly detection metrics for Markov Chains are presented, where computing these metrics require finding least and most probable paths. The underlying Markov Chain is dynamically changing, and so fast methods for computing least and most probable paths are needed. We propose such methods based on the proposed reductions and using heuristic search algorithms.


Vector-Space Markov Random Fields via Exponential Families

arXiv.org Machine Learning

We present Vector-Space Markov Random Fields (VS-MRFs), a novel class of undirected graphical models where each variable can belong to an arbitrary vector space. VS-MRFs generalize a recent line of work on scalar-valued, uni-parameter exponential family and mixed graphical models, thereby greatly broadening the class of exponential families available (e.g., allowing multinomial and Dirichlet distributions). Specifically, VS-MRFs are the joint graphical model distributions where the node-conditional distributions belong to generic exponential families with general vector space domains. We also present a sparsistent $M$-estimator for learning our class of MRFs that recovers the correct set of edges with high probability. We validate our approach via a set of synthetic data experiments as well as a real-world case study of over four million foods from the popular diet tracking app MyFitnessPal. Our results demonstrate that our algorithm performs well empirically and that VS-MRFs are capable of capturing and highlighting interesting structure in complex, real-world data. All code for our algorithm is open source and publicly available.


Markov Chain Monte Carlo and Variational Inference: Bridging the Gap

arXiv.org Machine Learning

Recent advances in stochastic gradient variational inference have made it possible to perform variational Bayesian inference with posterior approximations containing auxiliary random variables. This enables us to explore a new synthesis of variational inference and Monte Carlo methods where we incorporate one or more steps of MCMC into our variational approximation. By doing so we obtain a rich class of inference algorithms bridging the gap between variational methods and MCMC, and offering the best of both worlds: fast posterior approximation through the maximization of an explicit objective, with the option of trading off additional computation for additional accuracy. We describe the theoretical foundations that make this possible and show some promising first results.


Contextual Analysis for Middle Eastern Languages with Hidden Markov Models

arXiv.org Artificial Intelligence

Displaying a document in Middle Eastern languages requires contextual analysis due to different presentational forms for each character of the alphabet. The words of the document will be formed by the joining of the correct positional glyphs representing corresponding presentational forms of the characters. A set of rules defines the joining of the glyphs. As usual, these rules vary from language to language and are subject to interpretation by the software developers. In this paper, we propose a machine learning approach for contextual analysis based on the first order Hidden Markov Model. We will design and build a model for the Farsi language to exhibit this technology. The Farsi model achieves 94 \% accuracy with the training based on a short list of 89 Farsi vocabularies consisting of 2780 Farsi characters. The experiment can be easily extended to many languages including Arabic, Urdu, and Sindhi. Furthermore, the advantage of this approach is that the same software can be used to perform contextual analysis without coding complex rules for each specific language. Of particular interest is that the languages with fewer speakers can have greater representation on the web, since they are typically ignored by software developers due to lack of financial incentives.


Advanced Mean Field Theory of Restricted Boltzmann Machine

arXiv.org Machine Learning

Learning in restricted Boltzmann machine is typically hard due to the computation of gradients of log-likelihood function. To describe the network state statistics of the restricted Boltzmann machine, we develop an advanced mean field theory based on the Bethe approximation. Our theory provides an efficient message passing based method that evaluates not only the partition function (free energy) but also its gradients without requiring statistical sampling. The results are compared with those obtained by the computationally expensive sampling based method.


A Compositional Framework for Grounding Language Inference, Generation, and Acquisition in Video

Journal of Artificial Intelligence Research

We present an approach to simultaneously reasoning about a video clip and an entire natural-language sentence. The compositional nature of language is exploited to construct models which represent the meanings of entire sentences composed out of the meanings of the words in those sentences mediated by a grammar that encodes the predicate-argument relations. We demonstrate that these models faithfully represent the meanings of sentences and are sensitive to how the roles played by participants (nouns), their characteristics (adjectives), the actions performed (verbs), the manner of such actions (adverbs), and changing spatial relations between participants (prepositions) affect the meaning of a sentence and how it is grounded in video. We exploit this methodology in three ways. In the first, a video clip along with a sentence are taken as input and the participants in the event described by the sentence are highlighted, even when the clip depicts multiple similar simultaneous events. In the second, a video clip is taken as input without a sentence and a sentence is generated that describes an event in that clip. In the third, a corpus of video clips is paired with sentences which describe some of the events in those clips and the meanings of the words in those sentences are learned. We learn these meanings without needing to specify which attribute of the video clips each word in a given sentence refers to. The learned meaning representations are shown to be intelligible to humans.