Goto

Collaborating Authors

 Bayesian Learning


Online Low-Rank Subspace Learning from Incomplete Data: A Bayesian View

arXiv.org Machine Learning

Extracting the underlying low-dimensional space where high-dimensional signals often reside has long been at the center of numerous algorithms in the signal processing and machine learning literature during the past few decades. At the same time, working with incomplete (partly observed) large scale datasets has recently been commonplace for diverse reasons. This so called {\it big data era} we are currently living calls for devising online subspace learning algorithms that can suitably handle incomplete data. Their envisaged objective is to {\it recursively} estimate the unknown subspace by processing streaming data sequentially, thus reducing computational complexity, while obviating the need for storing the whole dataset in memory. In this paper, an online variational Bayes subspace learning algorithm from partial observations is presented. To account for the unawareness of the true rank of the subspace, commonly met in practice, low-rankness is explicitly imposed on the sought subspace data matrix by exploiting sparse Bayesian learning principles. Moreover, sparsity, {\it simultaneously} to low-rankness, is favored on the subspace matrix by the sophisticated hierarchical Bayesian scheme that is adopted. In doing so, the proposed algorithm becomes adept in dealing with applications whereby the underlying subspace may be also sparse, as, e.g., in sparse dictionary learning problems. As shown, the new subspace tracking scheme outperforms its state-of-the-art counterparts in terms of estimation accuracy, in a variety of experiments conducted on simulated and real data.


An End-to-End Neural Network for Polyphonic Piano Music Transcription

arXiv.org Machine Learning

We present a supervised neural network model for polyphonic piano music transcription. The architecture of the proposed model is analogous to speech recognition systems and comprises an acoustic model and a music language model. The acoustic model is a neural network used for estimating the probabilities of pitches in a frame of audio. The language model is a recurrent neural network that models the correlations between pitch combinations over time. The proposed model is general and can be used to transcribe polyphonic music without imposing any constraints on the polyphony. The acoustic and language model predictions are combined using a probabilistic graphical model. Inference over the output variables is performed using the beam search algorithm. We perform two sets of experiments. We investigate various neural network architectures for the acoustic models and also investigate the effect of combining acoustic and music language model predictions using the proposed architecture. We compare performance of the neural network based acoustic models with two popular unsupervised acoustic models. Results show that convolutional neural network acoustic models yields the best performance across all evaluation metrics. We also observe improved performance with the application of the music language models. Finally, we present an efficient variant of beam search that improves performance and reduces run-times by an order of magnitude, making the model suitable for real-time applications.


MetaSeer.STEM:Towards Automating Meta-Analyses

AAAI Conferences

Meta-analysis is a principled statistical approach for summarizing quantitative information reported across studies within a research domain of interest. Although the results of meta-analyses can be highly informative,the process of collecting and coding the data for a meta analysis is often a labor-intensive effort fraught with the potential for human error and idiosyncrasy. This is due to the fact that researchers typically spend weeks poring over published journal articles, technical reports, book chapters and other materials in order to retrieve key data elements that are then manually coded for subsequent analyses (e.g., descriptive statistics, effect sizes, reliability estimates, demographics, and study conditions).In this paper, we propose a machine learning based system developed to support automated extraction of data pertinent to STEM education meta-analyses, including educational and human resource initiatives aimed at improving achievement, literacy and interest in the fields of science, technology, engineering, and mathematics.


Bayesian nonparametric image segmentation using a generalized Swendsen-Wang algorithm

arXiv.org Machine Learning

Unsupervised image segmentation aims at clustering the set of pixels of an image into spatially homogeneous regions. We introduce here a class of Bayesian nonparametric models to address this problem. These models are based on a combination of a Potts-like spatial smoothness component and a prior on partitions which is used to control both the number and size of clusters. This class of models is flexible enough to include the standard Potts model and the more recent Potts-Dirichlet Process model \cite{Orbanz2008}. More importantly, any prior on partitions can be introduced to control the global clustering structure so that it is possible to penalize small or large clusters if necessary. Bayesian computation is carried out using an original generalized Swendsen-Wang algorithm. Experiments demonstrate that our method is competitive in terms of RAND\ index compared to popular image segmentation methods, such as mean-shift, and recent alternative Bayesian nonparametric models.


Empirical Bayes Estimation for the Stochastic Blockmodel

arXiv.org Machine Learning

The stochastic blockmodel (SBM) is a generative model for network data introduced in Holland et al. (1983). The SBM is a member of the general class of latent position random graph models introduced in Hoff et al. (2002). These models have been used in various application domains as diverse as social networks (vertices may represent people with edges indicating social interaction), citation networks (who cites whom), connectomics (brain connectivity networks; vertices may represent neurons with edges indicating axon-synapse-dendrite connections, or vertices may represent brain regions with edges indicating connectivity between regions), and many others. For comprehensive reviews of statistical models and applications, see Fienberg (2010), Goldenberg et al. (2010), Fienberg (2012). In general, statistical inference on graphs is becoming essential in many areas of science, engineering, and business. The SBM supposes that each of n vertices is assigned to one of K blocks. The probability of an 1 edge between two vertices depends only on their respective block memberships, and the presence of edges are conditionally independent given block memberships.


Machine Learning Model of the Swift/BAT Trigger Algorithm for Long GRB Population Studies

arXiv.org Machine Learning

To draw inferences about gamma-ray burst (GRB) source populations based on Swift observations, it is essential to understand the detection efficiency of the Swift burst alert telescope (BAT). This study considers the problem of modeling the Swift/BAT triggering algorithm for long GRBs, a computationally expensive procedure, and models it using machine learning algorithms. A large sample of simulated GRBs from Lien 2014 is used to train various models: random forests, boosted decision trees (with AdaBoost), support vector machines, and artificial neural networks. The best models have accuracies of $\gtrsim97\%$ ($\lesssim 3\%$ error), which is a significant improvement on a cut in GRB flux which has an accuracy of $89.6\%$ ($10.4\%$ error). These models are then used to measure the detection efficiency of Swift as a function of redshift $z$, which is used to perform Bayesian parameter estimation on the GRB rate distribution. We find a local GRB rate density of $n_0 \sim 0.48^{+0.41}_{-0.23} \ {\rm Gpc}^{-3} {\rm yr}^{-1}$ with power-law indices of $n_1 \sim 1.7^{+0.6}_{-0.5}$ and $n_2 \sim -5.9^{+5.7}_{-0.1}$ for GRBs above and below a break point of $z_1 \sim 6.8^{+2.8}_{-3.2}$. This methodology is able to improve upon earlier studies by more accurately modeling Swift detection and using this for fully Bayesian model fitting. The code used in this is analysis is publicly available online (https://github.com/PBGraff/SwiftGRB_PEanalysis).


Toward Optimal Feature Selection in Naive Bayes for Text Categorization

arXiv.org Machine Learning

Automated feature selection is important for text categorization to reduce the feature size and to speed up the learning process of classifiers. In this paper, we present a novel and efficient feature selection framework based on the Information Theory, which aims to rank the features with their discriminative capacity for classification. We first revisit two information measures: Kullback-Leibler divergence and Jeffreys divergence for binary hypothesis testing, and analyze their asymptotic properties relating to type I and type II errors of a Bayesian classifier. We then introduce a new divergence measure, called Jeffreys-Multi-Hypothesis (JMH) divergence, to measure multi-distribution divergence for multi-class classification. Based on the JMH-divergence, we develop two efficient feature selection methods, termed maximum discrimination ($MD$) and $MD-\chi^2$ methods, for text categorization. The promising results of extensive experiments demonstrate the effectiveness of the proposed approaches.


A Variational Analysis of Stochastic Gradient Algorithms

arXiv.org Machine Learning

Stochastic Gradient Descent (SGD) is an important algorithm in machine learning. With constant learning rates, it is a stochastic process that, after an initial phase of convergence, generates samples from a stationary distribution. We show that SGD with constant rates can be effectively used as an approximate posterior inference algorithm for probabilistic modeling. Specifically, we show how to adjust the tuning parameters of SGD such as to match the resulting stationary distribution to the posterior. This analysis rests on interpreting SGD as a continuous-time stochastic process and then minimizing the Kullback-Leibler divergence between its stationary distribution and the target posterior. (This is in the spirit of variational inference.) In more detail, we model SGD as a multivariate Ornstein-Uhlenbeck process and then use properties of this process to derive the optimal parameters. This theoretical framework also connects SGD to modern scalable inference algorithms; we analyze the recently proposed stochastic gradient Fisher scoring under this perspective. We demonstrate that SGD with properly chosen constant rates gives a new way to optimize hyperparameters in probabilistic models.


Interpretable Selection and Visualization of Features and Interactions Using Bayesian Forests

arXiv.org Machine Learning

It is becoming increasingly important for machine learning methods to make predictions that are interpretable as well as accurate. In many practical applications, it is of interest which features and feature interactions are relevant to the prediction task. We present a novel method, Selective Bayesian Forest Classifier, that strikes a balance between predictive power and interpretability by simultaneously performing classification, feature selection, feature interaction detection and visualization. It builds parsimonious yet flexible models using tree-structured Bayesian networks, and samples an ensemble of such models using Markov chain Monte Carlo. We build in feature selection by dividing the trees into two groups according to their relevance to the outcome of interest. Our method performs competitively on classification and feature selection benchmarks in low and high dimensions, and includes a visualization tool that provides insight into relevant features and interactions.


A Tractable Fully Bayesian Method for the Stochastic Block Model

arXiv.org Machine Learning

The stochastic block model (SBM) is a generative model revealing macroscopic structures in graphs. Bayesian methods are used for (i) cluster assignment inference and (ii) model selection for the number of clusters. In this paper, we study the behavior of Bayesian inference in the SBM in the large sample limit. Combining variational approximation and Laplace's method, a consistent criterion of the fully marginalized log-likelihood is established. Based on that, we derive a tractable algorithm that solves tasks (i) and (ii) concurrently, obviating the need for an outer loop to check all model candidates. Our empirical and theoretical results demonstrate that our method is scalable in computation, accurate in approximation, and concise in model selection.