Goto

Collaborating Authors

 Bayesian Inference


The Kernel Kalman Rule โ€” Efficient Nonparametric Inference with Recursive Least Squares

AAAI Conferences

Nonparametric inference techniques provide promising tools for probabilistic reasoning in high-dimensional nonlinear systems.Most of these techniques embed distributions into reproducing kernel Hilbert spaces (RKHS) and rely on the kernel Bayes' rule (KBR) to manipulate the embeddings. However, the computational demands of the KBR scale poorly with the number of samples and the KBR often suffers from numerical instabilities. In this paper, we present the kernel Kalman rule (KKR) as an alternative to the KBR.The derivation of the KKR is based on recursive least squares, inspired by the derivation of the Kalman innovation update.We apply the KKR to filtering tasks where we use RKHS embeddings to represent the belief state, resulting in the kernel Kalman filter (KKF).We show on a nonlinear state estimation task with high dimensional observations that our approach provides a significantly improved estimation accuracy while the computational demands are significantly decreased.


Latent Dependency Forest Models

AAAI Conferences

Probabilistic modeling is one of the foundations of modern Learning the structure of a probabilistic model resembles machine learning and artificial intelligence, which aims to learning the set of production rules of a grammar, while compactly represent the joint probability distribution of random learning model parameters resembles learning grammar rule variables. The most widely used approach for probabilistic probabilities. From the unsupervised grammar learning literature, modeling is probabilistic graphical models. A probabilistic one can see that learning approaches based on PCFGs graphical model represents a probability distribution with a have not been very successful, while the state-of-the-art performance directed or undirected graph. It represents random variables has mostly been achieved based on less expressive with the nodes in the graph and uses the edges in the graph to models such as dependency grammars (DGs) (Klein and encode the probabilistic relationships between random variables.


Open-Universe Weighted Model Counting

AAAI Conferences

Weighted model counting (WMC) has recently emerged as an effective and general approach to probabilistic inference, offering a computational framework for encoding a variety of formalisms, such as factor graphs and Bayesian networks.The advent of large-scale probabilistic knowledge bases has generated further interest in relational probabilistic representations, obtained by according weights to first-order formulas, whose semantics is given in terms of the ground theory, and solved by WMC. A fundamental limitation is that the domain of quantification, by construction and design, is assumed to be finite, which is at odds with areas such as vision and language understanding, where the existence of objects must be inferred from raw data. Dropping the finite-domain assumption has been known to improve the expressiveness of a first-order language for open-universe purposes, but these languages, so far, have eluded WMC approaches. In this paper, we revisit relational probabilistic models over an infinite domain, and establish a number of results that permit effective algorithms. We demonstrate this language on a number of examples, including a parameterized version of Pearl's Burglary-Earthquake-Alarm Bayesian network.


Efficiently Mining High Quality Phrases from Texts

AAAI Conferences

Phrase mining is a key research problem for semantic analysis and text-based information retrieval. The existing approaches based on NLP, frequency, and statistics cannot extract high quality phrases and the processing is also time consuming, which are not suitable for dynamic on-line applications. In this paper, we propose an efficient high-quality phrase mining approach (EQPM). To the best of our knowledge, our work is the first effort that considers both intra-cohesion and inter-isolation in mining phrases, which is able to guarantee appropriateness. We also propose a strategy to eliminate order sensitiveness, and ensure the completeness of phrases. We further design efficient algorithms to make the proposed model and strategy feasible. The empirical evaluations on four real data sets demonstrate that our approach achieved a considerable quality improvement and the processing time was 2.3X - 29X faster than the state-of-the-art works.


Recurrent Attentional Topic Model

AAAI Conferences

In a document, the topic distribution of a sentence depends on both the topics of preceding sentences and its own content, and it is usually affected by the topics of the preceding sentences with different weights. It is natural that a document can be treated as a sequence of sentences. Most existing works for Bayesian document modeling do not take these points into consideration. To fill this gap, we propose a Recurrent Attentional Topic Model (RATM) for document embedding. The RATM not only takes advantage of the sequential orders among sentence but also use the attention mechanism to model the relations among successive sentences. In RATM, we propose a Recurrent Attentional Bayesian Process (RABP) to handle the sequences. Based on the RABP, RATM fully utilizes the sequential information of the sentences in a document. Experiments on two copora show that our model outperforms state-of-the-art methods on document modeling and classification.


Maximum Reconstruction Estimation for Generative Latent-Variable Models

AAAI Conferences

Generative latent-variable models are important for natural language processing due to their capability of providing compact representations of data. As conventional maximum likelihood estimation (MLE) is prone to focus on explaining irrelevant but common correlations in data, we apply maximum reconstruction estimation (MRE) to learning generative latent-variable models alternatively, which aims to find model parameters that maximize the probability of reconstructing the observed data. We develop tractable algorithms to directly learn hidden Markov models and IBM translation models using the MRE criterion, without the need to introduce a separate reconstruction model to facilitate efficient inference. Experiments on unsupervised part-of-speech induction and unsupervised word alignment show that our approach enables generative latent-variable models to better discover intended correlations in data and outperforms maximum likelihood estimators significantly.


Variable Kernel Density Estimation in High-Dimensional Feature Spaces

AAAI Conferences

Estimating the joint probability density function of a dataset is a central task in many machine learning applications. In this work we address the fundamental problem of kernel bandwidth estimation for variable kernel density estimation in high-dimensional feature spaces. We derive a variable kernel bandwidth estimator by minimizing the leave-one-out entropy objective function and show that this estimator is capable of performing estimation in high-dimensional feature spaces with great success. We compare the performance of this estimator to state-of-the art maximum-likelihood estimators on a number of representative high-dimensional machine learning tasks and show that the newly introduced minimum leave-one-out entropy estimator performs optimally on a number of high-dimensional datasets considered.


Cross-Domain Ranking via Latent Space Learning

AAAI Conferences

We study the problem of cross-domain ranking, which addresses learning to rank objects from multiple interrelated domains. In many applications, we may have multiple interrelated domains, some of them with a large amount of training data and others with very little. We often wish to utilize the training data from all these related domains to help improve ranking performance. In this paper, we present a unified model: BayCDR for cross-domain ranking. BayCDR uses a latent space to measure the correlation between different domains, and learns the ranking functions from the interrelated domains via the latent space by a Bayesian model, where each ranking function is based on a weighted average model. An efficient learning algorithm based on variational inference and a generalization bound has been developed. To scale up to handle real large data, we also present a learning algorithm under the Map-Reduce programming model. Finally, we demonstrate the effectiveness and efficiency of BayCDR on large datasets.


Non-Negative Inductive Matrix Completion for Discrete Dyadic Data

AAAI Conferences

We present a non-negative inductive latent factor model for binary- and count-valued matrices containing dyadic data, with side information along the rows and/or the columns of the matrix. The side information is incorporated by conditioning the row and column latent factors on the available side information via a regression model. Our model can not only perform matrix factorization and completion with side-information, but also infers interpretable latent topics that explain/summarize the data. An appealing aspect of our model is in the full local conjugacy of all parts of the model, including the main latent factor model, as well as for the regression model that leverages the side information. This enables us to design scalable and simple to implement Gibbs sampling and Expectation Maximization algorithms for doing inference in the model. Inference cost in our model scales in the number of nonzeros in the data matrix, which makes it particularly attractive for massive, sparse matrices. We demonstrate the effectiveness of our model on several real-world data sets, comparing it with state-of-the-art baselines.


The Multivariate Generalised von Mises Distribution: Inference and Applications

AAAI Conferences

Circular variables arise in a multitude of data-modelling contexts ranging from robotics to the social sciences, but they have been largely overlooked by the machine learning community. This paper partially redresses this imbalance by extending some standard probabilistic modelling tools to the circular domain. First we introduce a new multivariate distribution over circular variables, called the multivariate Generalised von Mises (mGvM) distribution. This distribution can be constructed by restricting and renormalising a general multivariate Gaussian distribution to the unit hyper-torus. Previously proposed multivariate circular distributions are shown to be special cases of this construction. Second, we introduce a new probabilistic model for circular regression inspired by Gaussian Processes, and a method for probabilistic Principal Component Analysis with circular hidden variables. These models can leverage standard modelling tools (e.g. kernel functions and automatic relevance determination). Third, we show that the posterior distribution in these models is a mGvM distribution which enables development of an efficient variational free-energy scheme for performing approximate inference and approximate maximum-likelihood learning.