Goto

Collaborating Authors

 Statistical Learning


Equivalence of Kernel Machine Regression and Kernel Distance Covariance for Multidimensional Trait Association Studies

arXiv.org Machine Learning

Associating genetic markers with a multidimensional phenotype is an important yet challenging problem. In this work, we establish the equivalence between two popular methods: kernel-machine regression (KMR), and kernel distance covariance (KDC). KMR is a semiparametric regression frameworks that models the covariate effects parametrically, while the genetic markers are considered non-parametrically. KDC represents a class of methods that includes distance covariance (DC) and Hilbert-Schmidt Independence Criterion (HSIC), which are nonparametric tests of independence. We show the equivalence between the score test of KMR and the KDC statistic under certain conditions. This result leads to a novel generalization of the KDC test that incorporates the covariates. Our contributions are three-fold: (1) establishing the equivalence between KMR and KDC; (2) showing that the principles of kernel machine regression can be applied to the interpretation of KDC; (3) the development of a broader class of KDC statistics, that the members are the quantities of different kernels. We demonstrate the proposals using simulation studies. Data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) is used to explore the association between the genetic variants on gene \emph{FLJ16124} and phenotypes represented in 3D structural brain MR images adjusting for age and gender. The results suggest that SNPs of \emph{FLJ16124} exhibit strong pairwise interaction effects that are correlated to the changes of brain region volumes.


Toward computational cumulative biology by combining models of biological datasets

arXiv.org Machine Learning

A main challenge of data-driven sciences is how to make maximal use of the progressively expanding databases of experimental datasets in order to keep research cumulative. We introduce the idea of a modeling-based dataset retrieval engine designed for relating a researcher's experimental dataset to earlier work in the field. The search is (i) data-driven to enable new findings, going beyond the state of the art of keyword searches in annotations, (ii) modeling-driven, to both include biological knowledge and insights learned from data, and (iii) scalable, as it is accomplished without building one unified grand model of all data. Assuming each dataset has been modeled beforehand, by the researchers or by database managers, we apply a rapidly computable and optimizable combination model to decompose a new dataset into contributions from earlier relevant models. By using the data-driven decomposition we identify a network of interrelated datasets from a large annotated human gene expression atlas. While tissue type and disease were major driving forces for determining relevant datasets, the found relationships were richer and the model-based search was more accurate than keyword search; it moreover recovered biologically meaningful relationships that are not straightforwardly visible from annotations, for instance, between cells in different developmental stages such as thymocytes and T-cells. Data-driven links and citations matched to a large extent; the data-driven links even uncovered corrections to the publication data, as two of the most linked datasets were not highly cited and turned out to have wrong publication entries in the database.


An Enhanced Features Extractor for a Portfolio of Constraint Solvers

arXiv.org Artificial Intelligence

Recent research has shown that a single arbitrarily efficient solver can be significantly outperformed by a portfolio of possibly slower on-average solvers. The solver selection is usually done by means of (un)supervised learning techniques which exploit features extracted from the problem specification. In this paper we present an useful and flexible framework that is able to extract an extensive set of features from a Constraint (Satisfaction/Optimization) Problem defined in possibly different modeling languages: MiniZinc, FlatZinc or XCSP. We also report some empirical results showing that the performances that can be obtained using these features are effective and competitive with state of the art CSP portfolio techniques.


Particle filter-based Gaussian process optimisation for parameter inference

arXiv.org Machine Learning

We propose a novel method for maximum likelihood-based parameter inference in nonlinear and/or non-Gaussian state space models. The method is an iterative procedure with three steps. At each iteration a particle filter is used to estimate the value of the log-likelihood function at the current parameter iterate. Using these log-likelihood estimates, a surrogate objective function is created by utilizing a Gaussian process model. Finally, we use a heuristic procedure to obtain a revised parameter iterate, providing an automatic trade-off between exploration and exploitation of the surrogate model. The method is profiled on two state space models with good performance both considering accuracy and computational cost.


Sparse K-Means with $\ell_{\infty}/\ell_0$ Penalty for High-Dimensional Data Clustering

arXiv.org Machine Learning

Sparse clustering, which aims to find a proper partition of an extremely high-dimensional data set with redundant noise features, has been attracted more and more interests in recent years. The existing studies commonly solve the problem in a framework of maximizing the weighted feature contributions subject to a $\ell_2/\ell_1$ penalty. Nevertheless, this framework has two serious drawbacks: One is that the solution of the framework unavoidably involves a considerable portion of redundant noise features in many situations, and the other is that the framework neither offers intuitive explanations on why this framework can select relevant features nor leads to any theoretical guarantee for feature selection consistency. In this article, we attempt to overcome those drawbacks through developing a new sparse clustering framework which uses a $\ell_{\infty}/\ell_0$ penalty. First, we introduce new concepts on optimal partitions and noise features for the high-dimensional data clustering problems, based on which the previously known framework can be intuitively explained in principle. Then, we apply the suggested $\ell_{\infty}/\ell_0$ framework to formulate a new sparse k-means model with the $\ell_{\infty}/\ell_0$ penalty ($\ell_0$-k-means for short). We propose an efficient iterative algorithm for solving the $\ell_0$-k-means. To deeply understand the behavior of $\ell_0$-k-means, we prove that the solution yielded by the $\ell_0$-k-means algorithm has feature selection consistency whenever the data matrix is generated from a high-dimensional Gaussian mixture model. Finally, we provide experiments with both synthetic data and the Allen Developing Mouse Brain Atlas data to support that the proposed $\ell_0$-k-means exhibits better noise feature detection capacity over the previously known sparse k-means with the $\ell_2/\ell_1$ penalty ($\ell_1$-k-means for short).


Venture: a higher-order probabilistic programming platform with programmable inference

arXiv.org Machine Learning

We describe Venture, an interactive virtual machine for probabilistic programming that aims to be sufficiently expressive, extensible, and efficient for general-purpose use. Like Church, probabilistic models and inference problems in Venture are specified via a Turing-complete, higher-order probabilistic language descended from Lisp. Unlike Church, Venture also provides a compositional language for custom inference strategies built out of scalable exact and approximate techniques. We also describe four key aspects of Venture's implementation that build on ideas from probabilistic graphical models. First, we describe the stochastic procedure interface (SPI) that specifies and encapsulates primitive random variables. The SPI supports custom control flow, higher-order probabilistic procedures, partially exchangeable sequences and ``likelihood-free'' stochastic simulators. It also supports external models that do inference over latent variables hidden from Venture. Second, we describe probabilistic execution traces (PETs), which represent execution histories of Venture programs. PETs capture conditional dependencies, existential dependencies and exchangeable coupling. Third, we describe partitions of execution histories called scaffolds that factor global inference problems into coherent sub-problems. Finally, we describe a family of stochastic regeneration algorithms for efficiently modifying PET fragments contained within scaffolds. Stochastic regeneration linear runtime scaling in cases where many previous approaches scaled quadratically. We show how to use stochastic regeneration and the SPI to implement general-purpose inference strategies such as Metropolis-Hastings, Gibbs sampling, and blocked proposals based on particle Markov chain Monte Carlo and mean-field variational inference techniques.


Large-Scale Optimization for Evaluation Functions with Minimax Search

Journal of Artificial Intelligence Research

This paper presents a new method, Minimax Tree Optimization (MMTO), to learn a heuristic evaluation function of a practical alpha-beta search program. The evaluation function may be a linear or non-linear combination of weighted features, and the weights are the parameters to be optimized. To control the search results so that the move decisions agree with the game records of human experts, a well-modeled objective function to be minimized is designed. Moreover, a numerical iterative method is used to nd local minima of the objective function, and more than forty million parameters are adjusted by using a small number of hyper parameters. This method was applied to shogi, a major variant of chess in which the evaluation function must handle a larger state space than in chess. Experimental results show that the large-scale optimization of the evaluation function improves the playing strength of shogi programs, and the new method performs signicantly better than other methods. Implementation of the new method in our shogi program Bonanza made substantial contributions to the program's rst-place nish in the 2013 World Computer Shogi Championship. Additionally, we present preliminary evidence of broader applicability of our method to other two-player games such as chess.


Approximate Matrix Multiplication with Application to Linear Embeddings

arXiv.org Machine Learning

In this paper, we study the problem of approximately computing the product of two real matrices. In particular, we analyze a dimensionality-reduction-based approximation algorithm due to Sarlos [1], introducing the notion of nuclear rank as the ratio of the nuclear norm over the spectral norm. The presented bound has improved dependence with respect to the approximation error (as compared to previous approaches), whereas the subspace -- on which we project the input matrices -- has dimensions proportional to the maximum of their nuclear rank and it is independent of the input dimensions. In addition, we provide an application of this result to linear low-dimensional embeddings. Namely, we show that any Euclidean point-set with bounded nuclear rank is amenable to projection onto number of dimensions that is independent of the input dimensionality, while achieving additive error guarantees.


Confidence Intervals for Random Forests: The Jackknife and the Infinitesimal Jackknife

arXiv.org Machine Learning

We study the variability of predictions made by bagged learners and random forests, and show how to estimate standard errors for these methods. Our work builds on variance estimates for bagging proposed by Efron (1992, 2012) that are based on the jackknife and the infinitesimal jackknife (IJ). In practice, bagged predictors are computed using a finite number B of bootstrap replicates, and working with a large B can be computationally expensive. Direct applications of jackknife and IJ estimators to bagging require B on the order of n^{1.5} bootstrap replicates to converge, where n is the size of the training set. We propose improved versions that only require B on the order of n replicates. Moreover, we show that the IJ estimator requires 1.7 times less bootstrap replicates than the jackknife to achieve a given accuracy. Finally, we study the sampling distributions of the jackknife and IJ variance estimates themselves. We illustrate our findings with multiple experiments and simulation studies.


Random Binary Mappings for Kernel Learning and Efficient SVM

arXiv.org Machine Learning

Support Vector Machines (SVMs) are powerful learners that have led to state-of-the-art results in various computer vision problems. SVMs suffer from various drawbacks in terms of selecting the right kernel, which depends on the image descriptors, as well as computational and memory efficiency. This paper introduces a novel kernel, which serves such issues well. The kernel is learned by exploiting a large amount of low-complex, randomized binary mappings of the input feature. This leads to an efficient SVM, while also alleviating the task of kernel selection. We demonstrate the capabilities of our kernel on 6 standard vision benchmarks, in which we combine several common image descriptors, namely histograms (Flowers17 and Daimler), attribute-like descriptors (UCI, OSR, and a-VOC08), and Sparse Quantization (ImageNet). Results show that our kernel learning adapts well to the different descriptors types, achieving the performance of the kernels specifically tuned for each image descriptor, and with similar evaluation cost as efficient SVM methods.