Goto

Collaborating Authors

 Directed Networks


Markov Boundary Discovery with Ridge Regularized Linear Models

arXiv.org Machine Learning

Ridge regularized linear models (RRLMs), such as ridge regression and the SVM, are a popular group of methods that are used in conjunction with coefficient hypothesis testing to discover explanatory variables with a significant multivariate association to a response. However, many investigators are reluctant to draw causal interpretations of the selected variables due to the incomplete knowledge of the capabilities of RRLMs in causal inference. Under reasonable assumptions, we show that a modified form of RRLMs can get very close to identifying a subset of the Markov boundary by providing a worst-case bound on the space of possible solutions. The results hold for any convex loss, even when the underlying functional relationship is nonlinear, and the solution is not unique. Our approach combines ideas in Markov boundary and sufficient dimension reduction theory. Experimental results show that the modified RRLMs are competitive against state-of-the-art algorithms in discovering part of the Markov boundary from gene expression data.


Adaptive Low-Complexity Sequential Inference for Dirichlet Process Mixture Models

arXiv.org Machine Learning

We develop a sequential low-complexity inference procedure for Dirichlet process mixtures of Gaussians for online clustering and parameter estimation when the number of clusters are unknown a-priori. We present an easily computable, closed form parametric expression for the conditional likelihood, in which hyperparameters are recursively updated as a function of the streaming data assuming conjugate priors. Motivated by large-sample asymptotics, we propose a novel adaptive low-complexity design for the Dirichlet process concentration parameter and show that the number of classes grow at most at a logarithmic rate. We further prove that in the large-sample limit, the conditional likelihood and data predictive distribution become asymptotically Gaussian. We demonstrate through experiments on synthetic and real data sets that our approach is superior to other online state-of-the-art methods.


Newton-based maximum likelihood estimation in nonlinear state space models

arXiv.org Machine Learning

Maximum likelihood (ML) estimation using Newton's method in nonlinear state space models (SSMs) is a challenging problem due to the analytical intractability of the log-likelihood and its gradient and Hessian. We estimate the gradient and Hessian using Fisher's identity in combination with a smoothing algorithm. We explore two approximations of the log-likelihood and of the solution of the smoothing problem. The first is a linearization approximation which is computationally cheap, but the accuracy typically varies between models. The second is a sampling approximation which is asymptotically valid for any SSM but is more computationally costly. We demonstrate our approach for ML parameter estimation on simulated data from two different SSMs with encouraging results. Keywords: Maximum likelihood, parameter estimation, nonlinear state space models, Fisher's identity, extended Kalman filters, particle methods, Newton optimization.


A Variational Bayesian State-Space Approach to Online Passive-Aggressive Regression

arXiv.org Machine Learning

Online Passive-Aggressive (PA) learning is a class of online margin-based algorithms suitable for a wide range of real-time prediction tasks, including classification and regression. PA algorithms are formulated in terms of deterministic point-estimation problems governed by a set of user-defined hyperparameters: the approach fails to capture model/prediction uncertainty and makes their performance highly sensitive to hyperparameter configurations. In this paper, we introduce a novel PA learning framework for regression that overcomes the above limitations. We contribute a Bayesian state-space interpretation of PA regression, along with a novel online variational inference scheme, that not only produces probabilistic predictions, but also offers the benefit of automatic hyperparameter tuning. Experiments with various real-world data sets show that our approach performs significantly better than a more standard, linear Gaussian state-space model.


Supervised Collective Classification for Crowdsourcing

arXiv.org Machine Learning

Crowdsourcing utilizes the wisdom of crowds for collective classification via information (e.g., labels of an item) provided by labelers. Current crowdsourcing algorithms are mainly unsupervised methods that are unaware of the quality of crowdsourced data. In this paper, we propose a supervised collective classification algorithm that aims to identify reliable labelers from the training data (e.g., items with known labels). The reliability (i.e., weighting factor) of each labeler is determined via a saddle point algorithm. The results on several crowdsourced data show that supervised methods can achieve better classification accuracy than unsupervised methods, and our proposed method outperforms other algorithms.


Simultaneous Clustering and Model Selection for Multinomial Distribution: A Comparative Study

arXiv.org Machine Learning

In this paper, we study different discrete data clustering methods, which use the Model-Based Clustering (MBC) framework with the Multinomial distribution. Our study comprises several relevant issues, such as initialization, model estimation and model selection. Additionally, we propose a novel MBC method by efficiently combining the partitional and hierarchical clustering techniques. We conduct experiments on both synthetic and real data and evaluate the methods using accuracy, stability and computation time. Our study identifies appropriate strategies to be used for discrete data analysis with the MBC methods. Moreover, our proposed method is very competitive w.r.t.


Stochastic gradient variational Bayes for gamma approximating distributions

arXiv.org Machine Learning

While stochastic variational inference is relatively well known for scaling inference in Bayesian probabilistic models, related methods also offer ways to circumnavigate the approximation of analytically intractable expectations. The key challenge in either setting is controlling the variance of gradient estimates: recent work has shown that for continuous latent variables, particularly multivariate Gaussians, this can be achieved by using the gradient of the log posterior. In this paper we apply the same idea to gamma distributed latent variables given gamma variational distributions, enabling straightforward "black box" variational inference in models where sparsity and non-negativity are appropriate. We demonstrate the method on a recently proposed gamma process model for network data, as well as a novel sparse factor analysis. We outperform generic sampling algorithms and the approach of using Gaussian variational distributions on transformed variables.


Reliable ABC model choice via random forests

arXiv.org Machine Learning

Approximate Bayesian computation (ABC) methods provide an elaborate approach to Bayesian inference on complex models, including model choice. Both theoretical arguments and simulation experiments indicate, however, that model posterior probabilities may be poorly evaluated by standard ABC techniques. We propose a novel approach based on a machine learning tool named random forests to conduct selection among the highly complex models covered by ABC algorithms. We thus modify the way Bayesian model selection is both understood and operated, in that we rephrase the inferential goal as a classification problem, first predicting the model that best fits the data with random forests and postponing the approximation of the posterior probability of the predicted MAP for a second stage also relying on random forests. Compared with earlier implementations of ABC model choice, the ABC random forest approach offers several potential improvements: (i) it often has a larger discriminative power among the competing models, (ii) it is more robust against the number and choice of statistics summarizing the data, (iii) the computing effort is drastically reduced (with a gain in computation efficiency of at least fifty), and (iv) it includes an approximation of the posterior probability of the selected model. The call to random forests will undoubtedly extend the range of size of datasets and complexity of models that ABC can handle. We illustrate the power of this novel methodology by analyzing controlled experiments as well as genuine population genetics datasets. The proposed methodologies are implemented in the R package abcrf available on the CRAN.


Fast rates in statistical and online learning

arXiv.org Machine Learning

The speed with which a learning algorithm converges as it is presented with more data is a central problem in machine learning --- a fast rate of convergence means less data is needed for the same level of performance. The pursuit of fast rates in online and statistical learning has led to the discovery of many conditions in learning theory under which fast learning is possible. We show that most of these conditions are special cases of a single, unifying condition, that comes in two forms: the central condition for 'proper' learning algorithms that always output a hypothesis in the given model, and stochastic mixability for online algorithms that may make predictions outside of the model. We show that under surprisingly weak assumptions both conditions are, in a certain sense, equivalent. The central condition has a re-interpretation in terms of convexity of a set of pseudoprobabilities, linking it to density estimation under misspecification. For bounded losses, we show how the central condition enables a direct proof of fast rates and we prove its equivalence to the Bernstein condition, itself a generalization of the Tsybakov margin condition, both of which have played a central role in obtaining fast rates in statistical learning. Yet, while the Bernstein condition is two-sided, the central condition is one-sided, making it more suitable to deal with unbounded losses. In its stochastic mixability form, our condition generalizes both a stochastic exp-concavity condition identified by Juditsky, Rigollet and Tsybakov and Vovk's notion of mixability. Our unifying conditions thus provide a substantial step towards a characterization of fast rates in statistical learning, similar to how classical mixability characterizes constant regret in the sequential prediction with expert advice setting.


A fast numerical method for max-convolution and the application to efficient max-product inference in Bayesian networks

arXiv.org Machine Learning

In many fields it is common to have access to information about sums of random variables and to desire information about those variables themselves. In mass spectrometry, when two (or more) analytes with similar mass-to-charge 1 are measured, the intensity of the resulting peak is a function of the sum of abundances of those analytes (this problem occurs not only in the mass spectrometry of small molecules, but also in measuring isotope measurement in elemental and nuclear mass spectrometry). In transcriptomics, the abundance of a particular non-unique read (i.e., an RNA sequence that maps to multiple locations in the transcriptome or genome) provides information about the sum of the abundances of all transcripts that contain the read (each transcript weighted by how many copies of the read it carries). Proteomics has its own version of non-unique reads, shared peptides which can be found in multiple proteins (not only are shared peptides the principal source of difficulty in protein inference [14, 17, 18], they are also responsible for the difficulty evaluating putatative sets of discovered proteins [16, 19]). In population genetics, the prior knowledge about population structure can suggest an expected number of individuals with a particular genotype, which in turn yields probabilistic information about the individuals whose aggregate genotypes are expected to produce that sum (inference is particularly pronounced in polyploids, which increase the dimensionality of the problem [15]).