Not enough data to create a plot.
Try a different view from the menu above.
Country
Primal View on Belief Propagation
It is known that fixed points of loopy belief propagation (BP) correspond to stationary points of the Bethe variational problem, where we minimize the Bethe free energy subject to normalization and marginalization constraints. Unfortunately, this does not entirely explain BP because BP is a dual rather than primal algorithm to solve the Bethe variational problem -- beliefs are infeasible before convergence. Thus, we have no better understanding of BP than as an algorithm to seek for a common zero of a system of non-linear functions, not explicitly related to each other. In this theoretical paper, we show that these functions are in fact explicitly related -- they are the partial derivatives of a single function of reparameterizations. That means, BP seeks for a stationary point of a single function, without any constraints. This function has a very natural form: it is a linear combination of local log-partition functions, exactly as the Bethe entropy is the same linear combination of local entropies.
Modeling Multiple Annotator Expertise in the Semi-Supervised Learning Scenario
Yan, Yan, Rosales, Romer, Fung, Glenn, Dy, Jennifer
Learning algorithms normally assume that there is at most one annotation or label per data point. However, in some scenarios, such as medical diagnosis and on-line collaboration,multiple annotations may be available. In either case, obtaining labels for data points can be expensive and time-consuming (in some circumstances ground-truth may not exist). Semi-supervised learning approaches have shown that utilizing the unlabeled data is often beneficial in these cases. This paper presents a probabilistic semi-supervised model and algorithm that allows for learning from both unlabeled and labeled data in the presence of multiple annotators. We assume that it is known what annotator labeled which data points. The proposed approach produces annotator models that allow us to provide (1) estimates of the true label and (2) annotator variable expertise for both labeled and unlabeled data. We provide numerical comparisons under various scenarios and with respect to standard semi-supervised learning. Experiments showed that the presented approach provides clear advantages over multi-annotator methods that do not use the unlabeled data and over methods that do not use multi-labeler information.
Learning Structural Changes of Gaussian Graphical Models in Controlled Experiments
Graphical models are widely used in scienti fic and engineering research to represent conditional independence structures between random variables. In many controlled experiments, environmental changes or external stimuli can often alter the conditional dependence between the random variables, and potentially produce significant structural changes in the corresponding graphical models. Therefore, it is of great importance to be able to detect such structural changes from data, so as to gain novel insights into where and how the structural changes take place and help the system adapt to the new environment. Here we report an effective learning strategy to extract structural changes in Gaussian graphical model using l1-regularization based convex optimization. We discuss the properties of the problem formulation and introduce an efficient implementation by the block coordinate descent algorithm. We demonstrate the principle of the approach on a numerical simulation experiment, and we then apply the algorithm to the modeling of gene regulatory networks under different conditions and obtain promising yet biologically plausible results.
Negative Tree Reweighted Belief Propagation
Liu, Qiang, Ihler, Alexander T.
We introduce a new class of lower bounds on the log partition function of a Markov random field which makes use of a reversed Jensen's inequality. In particular, our method approximates the intractable distribution using a linear combination of spanning trees with negative weights. This technique is a lower-bound counterpart to the tree-reweighted belief propagation algorithm, which uses a convex combination of spanning trees with positive weights to provide corresponding upper bounds. We develop algorithms to optimize and tighten the lower bounds over the non-convex set of valid parameter values. Our algorithm generalizes mean field approaches (including naive and structured mean field approximations), which it includes as a limiting case.
Speeding up the binary Gaussian process classification
Vanhatalo, Jarno, Vehtari, Aki
Gaussian processes (GP) are attractive building blocks for many probabilistic models. Their drawbacks, however, are the rapidly increasing inference time and memory requirement alongside increasing data. The problem can be alleviated with compactly supported (CS) covariance functions, which produce sparse covariance matrices that are fast in computations and cheap to store. CS functions have previously been used in GP regression but here the focus is in a classification problem. This brings new challenges since the posterior inference has to be done approximately. We utilize the expectation propagation algorithm and show how its standard implementation has to be modified to obtain computational benefits from the sparse covariance matrices. We study four CS covariance functions and show that they may lead to substantial speed up in the inference time compared to globally supported functions.
Bayesian Model Averaging Using the k-best Bayesian Network Structures
Tian, Jin, He, Ru, Ram, Lavanya
We study the problem of learning Bayesian network structures from data. We develop an algorithm for finding the k-best Bayesian network structures. We propose to compute the posterior probabilities of hypotheses of interest by Bayesian model averaging over the k-best Bayesian networks. We present empirical results on structural discovery over several real and synthetic data sets and show that the method outperforms the model selection method and the state of-the-art MCMC methods.
A Bayesian Matrix Factorization Model for Relational Data
Singh, Ajit P., Gordon, Geoffrey
Relational learning can be used to augment one data source with other correlated sources of information, to improve predictive accuracy. We frame a large class of relational learning problems as matrix factorization problems, and propose a hierarchical Bayesian model. Training our Bayesian model using random-walk Metropolis-Hastings is impractically slow, and so we develop a block Metropolis-Hastings sampler which uses the gradient and Hessian of the likelihood to dynamically tune the proposal. We demonstrate that a predictive model of brain response to stimuli can be improved by augmenting it with side information about the stimuli.
Parametric Return Density Estimation for Reinforcement Learning
Morimura, Tetsuro, Sugiyama, Masashi, Kashima, Hisashi, Hachiya, Hirotaka, Tanaka, Toshiyuki
Most conventional Reinforcement Learning (RL) algorithms aim to optimize decision-making rules in terms of the expected returns. However, especially for risk management purposes, other risk-sensitive criteria such as the value-at-risk or the expected shortfall are sometimes preferred in real applications. Here, we describe a parametric method for estimating density of the returns, which allows us to handle various criteria in a unified manner. We first extend the Bellman equation for the conditional expected return to cover a conditional probability density of the returns. Then we derive an extension of the TD-learning algorithm for estimating the return densities in an unknown environment. As test instances, several parametric density estimation algorithms are presented for the Gaussian, Laplace, and skewed Laplace distributions. We show that these algorithms lead to risk-sensitive as well as robust RL paradigms through numerical experiments.
Super-Samples from Kernel Herding
Chen, Yutian, Welling, Max, Smola, Alex
We extend the herding algorithm to continuous spaces by using the kernel trick. The resulting "kernel herding" algorithm is an infinite memory deterministic process that learns to approximate a PDF with a collection of samples. We show that kernel herding decreases the error of expectations of functions in the Hilbert space at a rate O(1/T) which is much faster than the usual O(1/pT) for iid random samples. We illustrate kernel herding by approximating Bayesian predictive distributions.
Robust Metric Learning by Smooth Optimization
Huang, Kaizhu, Jin, Rong, Xu, Zenglin, Liu, Cheng-Lin
Most existing distance metric learning methods assume perfect side information that is usually given in pairwise or triplet constraints. Instead, in many real-world applications, the constraints are derived from side information, such as users' implicit feedbacks and citations among articles. As a result, these constraints are usually noisy and contain many mistakes. In this work, we aim to learn a distance metric from noisy constraints by robust optimization in a worst-case scenario, to which we refer as robust metric learning. We formulate the learning task initially as a combinatorial optimization problem, and show that it can be elegantly transformed to a convex programming problem. We present an efficient learning algorithm based on smooth optimization [7]. It has a worst-case convergence rate of O(1/{\surd}{\varepsilon}) for smooth optimization problems, where {\varepsilon} is the desired error of the approximate solution. Finally, our empirical study with UCI data sets demonstrate the effectiveness of the proposed method in comparison to state-of-the-art methods.