Uncertainty
The Kernel Kalman Rule — Efficient Nonparametric Inference with Recursive Least Squares
Gebhardt, Gregor H. W. (Technische Universität Darmstadt) | Kupcsik, Andras (National University of Singapore) | Neumann, Gerhard ( University of Lincoln )
Nonparametric inference techniques provide promising tools for probabilistic reasoning in high-dimensional nonlinear systems.Most of these techniques embed distributions into reproducing kernel Hilbert spaces (RKHS) and rely on the kernel Bayes' rule (KBR) to manipulate the embeddings. However, the computational demands of the KBR scale poorly with the number of samples and the KBR often suffers from numerical instabilities. In this paper, we present the kernel Kalman rule (KKR) as an alternative to the KBR.The derivation of the KKR is based on recursive least squares, inspired by the derivation of the Kalman innovation update.We apply the KKR to filtering tasks where we use RKHS embeddings to represent the belief state, resulting in the kernel Kalman filter (KKF).We show on a nonlinear state estimation task with high dimensional observations that our approach provides a significantly improved estimation accuracy while the computational demands are significantly decreased.
Latent Dependency Forest Models
Chu, Shanbo (ShanghaiTech University) | Jiang, Yong (ShanghaiTech University) | Tu, Kewei (ShanghaiTech University)
Probabilistic modeling is one of the foundations of modern Learning the structure of a probabilistic model resembles machine learning and artificial intelligence, which aims to learning the set of production rules of a grammar, while compactly represent the joint probability distribution of random learning model parameters resembles learning grammar rule variables. The most widely used approach for probabilistic probabilities. From the unsupervised grammar learning literature, modeling is probabilistic graphical models. A probabilistic one can see that learning approaches based on PCFGs graphical model represents a probability distribution with a have not been very successful, while the state-of-the-art performance directed or undirected graph. It represents random variables has mostly been achieved based on less expressive with the nodes in the graph and uses the edges in the graph to models such as dependency grammars (DGs) (Klein and encode the probabilistic relationships between random variables.
Open-Universe Weighted Model Counting
Belle, Vaishak (University of Edinburgh)
Weighted model counting (WMC) has recently emerged as an effective and general approach to probabilistic inference, offering a computational framework for encoding a variety of formalisms, such as factor graphs and Bayesian networks.The advent of large-scale probabilistic knowledge bases has generated further interest in relational probabilistic representations, obtained by according weights to first-order formulas, whose semantics is given in terms of the ground theory, and solved by WMC. A fundamental limitation is that the domain of quantification, by construction and design, is assumed to be finite, which is at odds with areas such as vision and language understanding, where the existence of objects must be inferred from raw data. Dropping the finite-domain assumption has been known to improve the expressiveness of a first-order language for open-universe purposes, but these languages, so far, have eluded WMC approaches. In this paper, we revisit relational probabilistic models over an infinite domain, and establish a number of results that permit effective algorithms. We demonstrate this language on a number of examples, including a parameterized version of Pearl's Burglary-Earthquake-Alarm Bayesian network.
Minimal Undefinedness for Fuzzy Answer Sets
Alviano, Mario (University of Calabria) | Amendola, Giovanni (University of Calabria) | Peñaloza, Rafael (Free University of Bozen-Bolzano )
Fuzzy Answer Set Programming (FASP) combines the non-monotonic reasoning typical of Answer Set Programming with the capability of Fuzzy Logic to deal with imprecise information and paraconsistent reasoning. In the context of paraconsistent reasoning, the fundamental principle of minimal undefinedness states that truth degrees close to 0 and 1 should be preferred to those close to 0.5, to minimize the ambiguity of the scenario. The aim of this paper is to enforce such a principle in FASP through the minimization of a measure of undefinedness. Algorithms that minimize undefinedness of fuzzy answer sets are presented, and implemented.
Efficiently Mining High Quality Phrases from Texts
Li, Bing (Northeastern University, Shenyang) | Yang, Xiaochun (Northeastern University, Shenyang) | Wang, Bin (Northeastern University, Shenyang) | Cui, Wei (Northeastern University, Shenyang)
Phrase mining is a key research problem for semantic analysis and text-based information retrieval. The existing approaches based on NLP, frequency, and statistics cannot extract high quality phrases and the processing is also time consuming, which are not suitable for dynamic on-line applications. In this paper, we propose an efficient high-quality phrase mining approach (EQPM). To the best of our knowledge, our work is the first effort that considers both intra-cohesion and inter-isolation in mining phrases, which is able to guarantee appropriateness. We also propose a strategy to eliminate order sensitiveness, and ensure the completeness of phrases. We further design efficient algorithms to make the proposed model and strategy feasible. The empirical evaluations on four real data sets demonstrate that our approach achieved a considerable quality improvement and the processing time was 2.3X - 29X faster than the state-of-the-art works.
Recurrent Attentional Topic Model
Li, Shuangyin (Hong Kong University of Science and Technology) | Zhang, Yu (Hong Kong University of Science and Technology) | Pan, Rong (Sun Yat-sen University) | Mao, Mingzhi (Sun Yat-sen University) | Yang, Yang (iPIN, Shen Zhen)
In a document, the topic distribution of a sentence depends on both the topics of preceding sentences and its own content, and it is usually affected by the topics of the preceding sentences with different weights. It is natural that a document can be treated as a sequence of sentences. Most existing works for Bayesian document modeling do not take these points into consideration. To fill this gap, we propose a Recurrent Attentional Topic Model (RATM) for document embedding. The RATM not only takes advantage of the sequential orders among sentence but also use the attention mechanism to model the relations among successive sentences. In RATM, we propose a Recurrent Attentional Bayesian Process (RABP) to handle the sequences. Based on the RABP, RATM fully utilizes the sequential information of the sentences in a document. Experiments on two copora show that our model outperforms state-of-the-art methods on document modeling and classification.
Maximum Reconstruction Estimation for Generative Latent-Variable Models
Cheng, Yong (Tsinghua University) | Liu, Yang (Tsinghua University) | Xu, Wei (Tsinghua University)
Generative latent-variable models are important for natural language processing due to their capability of providing compact representations of data. As conventional maximum likelihood estimation (MLE) is prone to focus on explaining irrelevant but common correlations in data, we apply maximum reconstruction estimation (MRE) to learning generative latent-variable models alternatively, which aims to find model parameters that maximize the probability of reconstructing the observed data. We develop tractable algorithms to directly learn hidden Markov models and IBM translation models using the MRE criterion, without the need to introduce a separate reconstruction model to facilitate efficient inference. Experiments on unsupervised part-of-speech induction and unsupervised word alignment show that our approach enables generative latent-variable models to better discover intended correlations in data and outperforms maximum likelihood estimators significantly.
Efficient Ordered Combinatorial Semi-Bandits for Whole-Page Recommendation
Wang, Yingfei (Princeton University) | Ouyang, Hua (Apple Inc.) | Wang, Chu (Nokia Bell Labs) | Chen, Jianhui (Yahoo Research) | Asamov, Tsvetan (Princeton University) | Chang, Yi (Huawei Research America)
Multi-Armed Bandit (MAB) framework has been successfully applied in many web applications. However, many complex real-world applications that involve multiple content recommendations cannot fit into the traditional MAB setting. To address this issue, we consider an ordered combinatorial semi-bandit problem where the learner recommends S actions from a base set of K actions, and displays the results in S (out of M ) different positions. The aim is to maximize the cumulative reward with respect to the best possible subset and positions in hindsight. By the adaptation of a minimum-cost maximum-flow network, a practical algorithm based on Thompson sampling is derived for the (contextual) combinatorial problem, thus resolving the problem of computational intractability.With its potential to work with whole-page recommendation and any probabilistic models, to illustrate the effectiveness of our method, we focus on Gaussian process optimization and a contextual setting where click-through rate is predicted using logistic regression. We demonstrate the algorithms’ performance on synthetic Gaussian process problems and on large-scale news article recommendation datasets from Yahoo! Front Page Today Module.
Variable Kernel Density Estimation in High-Dimensional Feature Spaces
Walt, Christiaan Maarten van der (Council for Scientific and Industrial Research, Modelling and Digital Science) | Barnard, Etienne (North-West University)
Estimating the joint probability density function of a dataset is a central task in many machine learning applications. In this work we address the fundamental problem of kernel bandwidth estimation for variable kernel density estimation in high-dimensional feature spaces. We derive a variable kernel bandwidth estimator by minimizing the leave-one-out entropy objective function and show that this estimator is capable of performing estimation in high-dimensional feature spaces with great success. We compare the performance of this estimator to state-of-the art maximum-likelihood estimators on a number of representative high-dimensional machine learning tasks and show that the newly introduced minimum leave-one-out entropy estimator performs optimally on a number of high-dimensional datasets considered.
Cross-Domain Ranking via Latent Space Learning
Tang, Jie (Tsinghua University) | Hall, Wendy (University of Southampton)
We study the problem of cross-domain ranking, which addresses learning to rank objects from multiple interrelated domains. In many applications, we may have multiple interrelated domains, some of them with a large amount of training data and others with very little. We often wish to utilize the training data from all these related domains to help improve ranking performance. In this paper, we present a unified model: BayCDR for cross-domain ranking. BayCDR uses a latent space to measure the correlation between different domains, and learns the ranking functions from the interrelated domains via the latent space by a Bayesian model, where each ranking function is based on a weighted average model. An efficient learning algorithm based on variational inference and a generalization bound has been developed. To scale up to handle real large data, we also present a learning algorithm under the Map-Reduce programming model. Finally, we demonstrate the effectiveness and efficiency of BayCDR on large datasets.