Goto

Collaborating Authors

 Bayesian Learning


Text Classification with Heterogeneous Information Network Kernels

AAAI Conferences

Text classification is an important problem with many applications. Traditional approaches represent text as a bag-of-words and build classifiers based on this representation. Rather than words, entity phrases, the relations between the entities, as well as the types of the entities and relations carry much more information to represent the texts. This paper presents a novel text as network classification framework, which introduces 1) a structured and typed heterogeneous information networks (HINs) representation of texts, and 2) a meta-path based approach to link texts. We show that with the new representation and links of texts, the structured and typed information of entities and relations can be incorporated into kernels. Particularly, we develop both simple linear kernel and indefinite kernel based on meta-paths in the HIN representation of texts, where we call them HIN-kernels. Using Freebase, a well-known world knowledge base, to construct HIN for texts, our experiments on two benchmark datasets show that the indefinite HIN kernel based on weighted meta-paths outperforms the state-of-the-art methods and other HIN-kernels.


Marginalized Continuous Time Bayesian Networks for Network Reconstruction from Incomplete Observations

AAAI Conferences

Continuous Time Bayesian Networks (CTBNs) provide a powerful means to model complex network dynamics. How- ever, their inference is computationally demanding โ€” especially if one considers incomplete and noisy time-series data. The latter gives rise to a joint state- and parameter estimation problem, which can only be solved numerically. Yet, finding the exact parameterization of the CTBN has often only secondary importance in practical scenarios. We therefore focus on the structure learning problem and present a way to analytically marginalize the Markov chain underlying the CTBN model with respect its parameters. Since the resulting stochastic process is parameter-free, its inference reduces to an optimal filtering problem. We solve the latter using an efficient parallel implementation of a sequential Monte Carlo scheme. Our framework enables CTBN inference to be applied to incomplete noisy time-series data frequently found in molecular biology and other disciplines.


Bayesian Matrix Completion via Adaptive Relaxed Spectral Regularization

AAAI Conferences

Bayesian matrix completion has been studied based on a low-rank matrix factorization formulation with promising results. However, little work has been done on Bayesian matrix completion based on the more direct spectral regularization formulation. We fill this gap by presenting a novel Bayesian matrix completion method based on spectral regularization. In order to circumvent the difficulties of dealing with the orthonormality constraints of singular vectors, we derive a new equivalent form with relaxed constraints, which then leads us to design an adaptive version of spectral regularization feasible for Bayesian inference. Our Bayesian method requires no parameter tuning and can infer the number of latent factors automatically. Experiments on synthetic and real datasets demonstrate encouraging results on rank recovery and collaborative filtering, with notably good results for very sparse matrices.


Metric Learning for Ordinal Data

AAAI Conferences

A large amount of ordinal-valued data exist in many domains, including medical and health science, social science, economics, political science, etc. Unlike image and speech datasets of real-valued data, learning with ordinal variables (i.e., features) presents unique challenges. In particular, the nominal differences between those feature values, which are just ranks, do not necessarily correspond to the real distances between the corresponding categories. Given their wide existence, it is imperative to develop machine learning algorithms that specifically address the need to model and infer with such data. In this paper, we present a novel metric learning algorithm that takes into consideration the nature of ordinal data. Our approach treats ordinal values as latent variables in intervals. Our algorithm then learns what those intervals are as well as distance metrics to measure distances between latent variables in those intervals. We derive the corresponding optimization algorithm and demonstrate how that can be solved effectively. Experimental results show that the proposed approach significantly improves baselines that do not explicitly model ordinal features.


Interaction Point Processes via Infinite Branching Model

AAAI Conferences

Many natural and social phenomena can be modeled by interaction point processes (IPPs) (Diggle et al. 1994), stochastic point processes considering the interaction between points. In this paper, we propose the infinite branching model (IBM), a Bayesian statistical model that can generalize and extend some popular IPPs, e.g., Hawkes process (Hawkes 1971; Hawkes and Oakes 1974). It treats IPP as a mixture of basis point processes with the aid of a distance dependent prior over branching structure that describes the relationship between points. The IBM can estimate point event intensity, interaction mechanism and branching structure simultaneously. A generic Metropolis-within-Gibbs sampling method is also developed for model parameter inference. The experiments on synthetic and real-world data demonstrate the superiority of the IBM.


Re-Active Learning: Active Learning with Relabeling

AAAI Conferences

Active learning seeks to train the best classifier at the lowest annotation cost by intelligently picking the best examples to label. Traditional algorithms assume there is a single annotator and disregard the possibility of requesting additional independent annotations for a previously labeled example. However, relabeling examples is important, because all annotators make mistakes โ€” especially crowdsourced workers, who have become a common source of training data. This paper seeks to understand the difference in marginal value between decreasing the noise of the training set via relabeling and increasing the size and diversity of the (noisier) training set by labeling new examples. We use the term re-active learning to denote this generalization of active learning. We show how traditional active learning methods perform poorly at re-active learning, present new algorithms designed for this important problem, formally characterize their behavior, and empirically show that our methods effectively make this tradeoff.


High-Order Stochastic Gradient Thermostats for Bayesian Learning of Deep Models

AAAI Conferences

Learning in deep models using Bayesian methods has generated significant attention recently. This is largely because of the feasibility of modern Bayesian methods to yield scalable learning and inference, while maintaining a measure of uncertainty in the model parameters. Stochastic gradient MCMC algorithms (SG-MCMC) are a family of diffusion-based sampling methods for large-scale Bayesian learning. In SG-MCMC, multivariate stochastic gradient thermostats (mSGNHT) augment each parameter of interest, with a momentum and a thermostat variable to maintain stationary distributions as target posterior distributions. As the number of variables in a continuous-time diffusion increases, its numerical approximation error becomes a practical bottleneck, so better use of a numerical integrator is desirable. To this end, we propose use of an efficient symmetric splitting integrator in mSGNHT, instead of the traditional Euler integrator. We demonstrate that the proposed scheme is more accurate, robust, and converges faster. These properties are demonstrated to be desirable in Bayesian deep learning. Extensive experiments on two canonical models and their deep extensions demonstrate that the proposed scheme improves general Bayesian posterior sampling, particularly for deep models.


Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks

AAAI Conferences

Effective training of deep neural networks suffers from two main issues. The first is that the parameter space of these models exhibit pathological curvature. Recent methods address this problem by using adaptive preconditioning for Stochastic Gradient Descent (SGD). These methods improve convergence by adapting to the local geometry of parameter space. A second issue is overfitting, which is typically addressed by early stopping. However, recent work has demonstrated that Bayesian model averaging mitigates this problem. The posterior can be sampled by using Stochastic Gradient Langevin Dynamics (SGLD). However, the rapidly changing curvature renders default SGLD methods inefficient. Here, we propose combining adaptive preconditioners with SGLD. In support of this idea, we give theoretical properties on asymptotic convergence and predictive risk. We also provide empirical results for Logistic Regression, Feedforward Neural Nets, and Convolutional Neural Nets, demonstrating that our preconditioned SGLD method gives state-of-the-art performance on these models.


Bounded Optimal Exploration in MDP

AAAI Conferences

Within the framework of probably approximately correct Markov decision processes (PAC-MDP), much theoretical work has focused on methods to attain near optimality after a relatively long period of learning and exploration. However, practical concerns require the attainment of satisfactory behavior within a short period of time. In this paper, we relax the PAC-MDP conditions to reconcile theoretically driven exploration methods and practical needs. We propose simple algorithms for discrete and continuous state spaces, and illustrate the benefits of our proposed relaxation via theoretical analyses and numerical examples. Our algorithms also maintain anytime error bounds and average loss bounds. Our approach accommodates both Bayesian and non-Bayesian methods.


Assumed Density Filtering Methods for Learning Bayesian Neural Networks

AAAI Conferences

Buoyed by the success of deep multilayer neural networks, there is renewed interest in scalable learning of Bayesian neural networks. Here, we study algorithms that utilize recent advances in Bayesian inference to efficiently learn distributions over network weights. In particular, we focus on recently proposed assumed density filtering based methods for learning Bayesian neural networks -- Expectation and Probabilistic backpropagation. Apart from scaling to large datasets, these techniques seamlessly deal with non-differentiable activation functions and provide parameter (learning rate, momentum) free learning. In this paper, we first rigorously compare the two algorithms and in the process develop several extensions, including a version of EBP for continuous regression problems and a PBP variant for binary classification. Next, we extend both algorithms to deal with multiclass classification and count regression problems. On a variety of diverse real world benchmarks, we find our extensions to be effective, achieving results competitive with the state-of-the-art.