Bayesian Learning
Regularization for Unsupervised Deep Neural Nets
Wang, Baiyang (Northwestern University) | Klabjan, Diego (Northwestern University)
Unsupervised neural networks, such as restricted Boltzmann machines (RBMs) and deep belief networks (DBNs), are powerful tools for feature selection and pattern recognition tasks. We demonstrate that overfitting occurs in such models just as in deep feedforward neural networks, and discuss possible regularization methods to reduce overfitting. We also propose a "partial" approach to improve the efficiency of Dropout/DropConnect in this scenario, and discuss the theoretical justification of these methods from model convergence and likelihood bounds. Finally, we compare the performance of these methods based on their likelihood and classification error rates for various pattern recognition data sets.
Variable Kernel Density Estimation in High-Dimensional Feature Spaces
Walt, Christiaan Maarten van der (Council for Scientific and Industrial Research, Modelling and Digital Science) | Barnard, Etienne (North-West University)
Estimating the joint probability density function of a dataset is a central task in many machine learning applications. In this work we address the fundamental problem of kernel bandwidth estimation for variable kernel density estimation in high-dimensional feature spaces. We derive a variable kernel bandwidth estimator by minimizing the leave-one-out entropy objective function and show that this estimator is capable of performing estimation in high-dimensional feature spaces with great success. We compare the performance of this estimator to state-of-the art maximum-likelihood estimators on a number of representative high-dimensional machine learning tasks and show that the newly introduced minimum leave-one-out entropy estimator performs optimally on a number of high-dimensional datasets considered.
Cross-Domain Ranking via Latent Space Learning
Tang, Jie (Tsinghua University) | Hall, Wendy (University of Southampton)
We study the problem of cross-domain ranking, which addresses learning to rank objects from multiple interrelated domains. In many applications, we may have multiple interrelated domains, some of them with a large amount of training data and others with very little. We often wish to utilize the training data from all these related domains to help improve ranking performance. In this paper, we present a unified model: BayCDR for cross-domain ranking. BayCDR uses a latent space to measure the correlation between different domains, and learns the ranking functions from the interrelated domains via the latent space by a Bayesian model, where each ranking function is based on a weighted average model. An efficient learning algorithm based on variational inference and a generalization bound has been developed. To scale up to handle real large data, we also present a learning algorithm under the Map-Reduce programming model. Finally, we demonstrate the effectiveness and efficiency of BayCDR on large datasets.
Non-Negative Inductive Matrix Completion for Discrete Dyadic Data
Rai, Piyush (Indian Institute of Technology Kanpur)
We present a non-negative inductive latent factor model for binary- and count-valued matrices containing dyadic data, with side information along the rows and/or the columns of the matrix. The side information is incorporated by conditioning the row and column latent factors on the available side information via a regression model. Our model can not only perform matrix factorization and completion with side-information, but also infers interpretable latent topics that explain/summarize the data. An appealing aspect of our model is in the full local conjugacy of all parts of the model, including the main latent factor model, as well as for the regression model that leverages the side information. This enables us to design scalable and simple to implement Gibbs sampling and Expectation Maximization algorithms for doing inference in the model. Inference cost in our model scales in the number of nonzeros in the data matrix, which makes it particularly attractive for massive, sparse matrices. We demonstrate the effectiveness of our model on several real-world data sets, comparing it with state-of-the-art baselines.
The Multivariate Generalised von Mises Distribution: Inference and Applications
Navarro, Alexandre K. W. (University of Cambridge) | Frellsen, Jes (University of Cambridge) | Turner, Richard E. (University of Cambridge)
Circular variables arise in a multitude of data-modelling contexts ranging from robotics to the social sciences, but they have been largely overlooked by the machine learning community. This paper partially redresses this imbalance by extending some standard probabilistic modelling tools to the circular domain. First we introduce a new multivariate distribution over circular variables, called the multivariate Generalised von Mises (mGvM) distribution. This distribution can be constructed by restricting and renormalising a general multivariate Gaussian distribution to the unit hyper-torus. Previously proposed multivariate circular distributions are shown to be special cases of this construction. Second, we introduce a new probabilistic model for circular regression inspired by Gaussian Processes, and a method for probabilistic Principal Component Analysis with circular hidden variables. These models can leverage standard modelling tools (e.g. kernel functions and automatic relevance determination). Third, we show that the posterior distribution in these models is a mGvM distribution which enables development of an efficient variational free-energy scheme for performing approximate inference and approximate maximum-likelihood learning.
Poisson Sum-Product Networks: A Deep Architecture for Tractable Multivariate Poisson Distributions
Molina, Alejandro (Technische Universitรคt Dortmund) | Natarajan, Sriraam (Indiana University) | Kersting, Kristian (Technische Universitรคt Dortmund)
Multivariate count data are pervasive in science in the form of histograms, contingency tables and others. Previous work on modeling this type of distributions do not allow for fast and tractable inference. In this paper we present a novel Poisson graphical model, the first based on sum product networks, called PSPN, allowing for positive as well as negative dependencies. We present algorithms for learning tree PSPNs from data as well as for tractable inference via symbolic evaluation. With these, information-theoretic measures such as entropy, mutual information, and distances among count variables can be computed without resorting to approximations. Additionally, we show a connection between PSPNs and LDA, linking the structure of tree PSPNs to a hierarchy of topics. The experimental results on several synthetic and real world datasets demonstrate that PSPN often outperform state-of-the-art while remaining tractable.
Continuous Conditional Dependency Network for Structured Regression
Han, Chao (Temple University) | Ghalwash, Mohamed (IBM T.J. Watson and Temple University) | Obradovic, Zoran (Temple University)
Structured regression on graphs aims to predict response variables from multiple nodes by discovering and exploiting the dependency structure among response variables. This problem is challenging since dependencies among response variables are always unknown, and the associated prior knowledge is non-symmetric. In previous studies, various promising solutions were proposed to improve structured regression by utilizing symmetric prior knowledge, learning sparse dependency structure among response variables, or learning representations of attributes of multiple nodes. However, none of them are capable of efficiently learning dependency structure while incorporating non-symmetric prior knowledge. To achieve these objectives, we proposed Continuous Conditional Dependency Network (CCDN) for structured regression. The intuitive idea behind this model is that each response variable is not only dependent on attributes from the same node, but also on response variables from all other nodes. This results in a joint modeling of local conditional probabilities. The parameter learning is formulated as a convex optimization problem and an effective sampling algorithm is proposed for inference. CCDN is flexible in absorbing non-symmetric prior knowledge. The performance of CCDN on multiple datasets provides evidence of its structure recovery ability and superior effectiveness and efficiency as compared to the state-of-the-art alternatives.
A Nearly-Black-Box Online Algorithm for Joint Parameter and State Estimation in Temporal Models
Erol, Yusuf Bugra (University of California, Berkeley) | Wu, Yi (University of California, Berkeley) | Li, Lei (Toutiao Lab) | Russell, Stuart (University of California, Berkeley)
Online joint parameter and state estimation is a core problem for temporal models.Most existing methods are either restricted to a particular class of models (e.g., the Storvik filter) or computationally expensive (e.g., particle MCMC). We propose a novel nearly-black-box algorithm, the Assumed Parameter Filter (APF), a hybrid of particle filtering for state variables and assumed density filtering for parameter variables.It has the following advantages:(a) it is online and computationally efficient;(b) it is applicable to both discrete and continuous parameter spaces with arbitrary transition dynamics.On a variety of toy and real models, APF generates more accurate results within a fixed computation budget compared to several standard algorithms from the literature.
Near-Optimal Active Learning of Halfspaces via Query Synthesis in the Noisy Setting
Chen, Lin (Yale University) | Hassani, Hamed (ETH Zurich) | Karbasi, Amin (Yale University)
In this paper, we consider the problem of actively learning a linear classifier through query synthesis where the learner can construct artificial queries in order to estimate the true decision boundaries. This problem has recently gained a lot of interest in automated science and adversarial reverse engineering for which only heuristic algorithms are known. In such applications, queries can be constructed de novo to elicit information (e.g., automated science) or to evade detection with minimal cost (e.g., adversarial reverse engineering). We develop a general framework, called dimension coupling (DC), that 1) reduces a d-dimensional learning problem to d-1 low dimensional sub-problems, 2) solves each sub-problem efficiently, 3) appropriately aggregates the results and outputs a linear classifier, and 4) provides a theoretical guarantee for all possible schemes of aggregation. The proposed method is proved resilient to noise. We show that the DC framework avoids the curse of dimensionality: its computational complexity scales linearly with the dimension. Moreover, we show that the query complexity of DC is near optimal (within a constant factor of the optimum algorithm). To further support our theoretical analysis, we compare the performance of DC with the existing work. We observe that DC consistently outperforms the prior arts in terms of query complexity while often running orders of magnitude faster.
Latent Discriminant Analysis with Representative Feature Discovery
Chen, Gang (State University of New York at Buffalo)
Linear Discriminant Analysis (LDA) is a well-known method for dimension reduction and classification with focus on discriminative feature selection. However, how to discover discriminative as well as representative features in LDA model has not been explored. In this paper, we propose a latent Fisher discriminant model with representative feature discovery in an semi-supervised manner. Specifically, our model leverages advantages of both discriminative and generative models by generalizing LDA with data-driven prior over the latent variables. Thus, our method combines multi-class, latent variables and dimension reduction in an unified Bayesian framework. We test our method on MUSK and Corel datasets and yield competitive results compared to baselines. We also demonstrate its capacity on the challenging TRECVID MED11 dataset for semantic keyframe extraction and conduct a human-factors ranking-based experimental evaluation, which clearly demonstrates our proposed method consistently extracts more semantically meaningful keyframes than challenging baselines.