Goto

Collaborating Authors

 Statistical Learning


Variable and Fixed Interval Exponential Smoothing

arXiv.org Machine Learning

Exponential smoothers are a simple and memory efficient way to compute running averages of time series. Here we define and describe practical properties of exponential smoothers for signals observed at constant and variable intervals.


Gaussian Process Models for HRTF based Sound-Source Localization and Active-Learning

arXiv.org Machine Learning

From a machine learning perspective, the human ability localize sounds can be modeled as a non-parametric and non-linear regression problem between binaural spectral features of sound received at the ears (input) and their sound-source directions (output). The input features can be summarized in terms of the individual's head-related transfer functions (HRTFs) which measure the spectral response between the listener's eardrum and an external point in $3$D. Based on these viewpoints, two related problems are considered: how can one achieve an optimal sampling of measurements for training sound-source localization (SSL) models, and how can SSL models be used to infer the subject's HRTFs in listening tests. First, we develop a class of binaural SSL models based on Gaussian process regression and solve a \emph{forward selection} problem that finds a subset of input-output samples that best generalize to all SSL directions. Second, we use an \emph{active-learning} approach that updates an online SSL model for inferring the subject's SSL errors via headphones and a graphical user interface. Experiments show that only a small fraction of HRTFs are required for $5^{\circ}$ localization accuracy and that the learned HRTFs are localized closer to their intended directions than non-individualized HRTFs.


Kernel Task-Driven Dictionary Learning for Hyperspectral Image Classification

arXiv.org Machine Learning

Dictionary learning algorithms have been successfully used in both reconstructive and discriminative tasks, where the input signal is represented by a linear combination of a few dictionary atoms. While these methods are usually developed under $\ell_1$ sparsity constrain (prior) in the input domain, recent studies have demonstrated the advantages of sparse representation using structured sparsity priors in the kernel domain. In this paper, we propose a supervised dictionary learning algorithm in the kernel domain for hyperspectral image classification. In the proposed formulation, the dictionary and classifier are obtained jointly for optimal classification performance. The supervised formulation is task-driven and provides learned features from the hyperspectral data that are well suited for the classification task. Moreover, the proposed algorithm uses a joint ($\ell_{12}$) sparsity prior to enforce collaboration among the neighboring pixels. The simulation results illustrate the efficiency of the proposed dictionary learning algorithm.


Adjusting Leverage Scores by Row Weighting: A Practical Approach to Coherent Matrix Completion

arXiv.org Machine Learning

Low-rank matrix completion is an important problem with extensive real-world applications. When observations are uniformly sampled from the underlying matrix entries, existing methods all require the matrix to be incoherent. This paper provides the first working method for coherent matrix completion under the standard uniform sampling model. Our approach is based on the weighted nuclear norm minimization idea proposed in several recent work, and our key contribution is a practical method to compute the weighting matrices so that the leverage scores become more uniform after weighting. Under suitable conditions, we are able to derive theoretical results, showing the effectiveness of our approach. Experiments on synthetic data show that our approach recovers highly coherent matrices with high precision, whereas the standard unweighted method fails even on noise-free data.


Empirically Estimable Classification Bounds Based on a New Divergence Measure

arXiv.org Machine Learning

Information divergence functions play a critical role in statistics and information theory. In this paper we show that a non-parametric f-divergence measure can be used to provide improved bounds on the minimum binary classification probability of error for the case when the training and test data are drawn from the same distribution and for the case where there exists some mismatch between training and test distributions. We confirm the theoretical results by designing feature selection algorithms using the criteria from these bounds and by evaluating the algorithms on a series of pathological speech classification tasks.


Global Convergence of Stochastic Gradient Descent for Some Non-convex Matrix Problems

arXiv.org Machine Learning

Stochastic gradient descent (SGD) on a low-rank factorization is commonly employed to speed up matrix problems including matrix completion, subspace tracking, and SDP relaxation. In this paper, we exhibit a step size scheme for SGD on a low-rank least-squares problem, and we prove that, under broad sampling conditions, our method converges globally from a random starting point within $O(\epsilon^{-1} n \log n)$ steps with constant probability for constant-rank problems. Our modification of SGD relates it to stochastic power iteration. We also show experiments to illustrate the runtime and convergence of the algorithm.


An Aggregation Method for Sparse Logistic Regression

arXiv.org Machine Learning

$L_1$ regularized logistic regression has now become a workhorse of data mining and bioinformatics: it is widely used for many classification problems, particularly ones with many features. However, $L_1$ regularization typically selects too many features and that so-called false positives are unavoidable. In this paper, we demonstrate and analyze an aggregation method for sparse logistic regression in high dimensions. This approach linearly combines the estimators from a suitable set of logistic models with different underlying sparsity patterns and can balance the predictive ability and model interpretability. Numerical performance of our proposed aggregation method is then investigated using simulation studies. We also analyze a published genome-wide case-control dataset to further evaluate the usefulness of the aggregation method in multilocus association mapping.


Evaluation of modelling approaches for predicting the spatial distribution of soil organic carbon stocks at the national scale

arXiv.org Machine Learning

Soil organic carbon (SOC) plays a major role in the global carbon budget. It can act as a source or a sink of atmospheric carbon, thereby possibly influencing the course of climate change. Improving the tools that model the spatial distributions of SOC stocks at national scales is a priority, both for monitoring changes in SOC and as an input for global carbon cycles studies. In this paper, we compare and evaluate two recent and promising modelling approaches. First, we considered several increasingly complex boosted regression trees (BRT), a convenient and efficient multiple regression model from the statistical learning field. Further, we considered a robust geostatistical approach coupled to the BRT models. Testing the different approaches was performed on the dataset from the French Soil Monitoring Network, with a consistent cross-validation procedure. We showed that when a limited number of predictors were included in the BRT model, the standalone BRT predictions were significantly improved by robust geostatistical modelling of the residuals. However, when data for several SOC drivers were included, the standalone BRT model predictions were not significantly improved by geostatistical modelling. Therefore, in this latter situation, the BRT predictions might be considered adequate without the need for geostatistical modelling, provided that i) care is exercised in model fitting and validating, and ii) the dataset does not allow for modelling of local spatial autocorrelations, as is the case for many national systematic sampling schemes.


Domain-Adversarial Neural Networks

arXiv.org Machine Learning

We introduce a new representation learning algorithm suited to the context of domain adaptation, in which data at training and test time come from similar but different distributions. Our algorithm is directly inspired by theory on domain adaptation suggesting that, for effective domain transfer to be achieved, predictions must be made based on a data representation that cannot discriminate between the training (source) and test (target) domains. We propose a training objective that implements this idea in the context of a neural network, whose hidden layer is trained to be predictive of the classification task, but uninformative as to the domain of the input. Our experiments on a sentiment analysis classification benchmark, where the target domain data available at training time is unlabeled, show that our neural network for domain adaption algorithm has better performance than either a standard neural network or an SVM, even if trained on input features extracted with the state-of-the-art marginalized stacked denoising autoencoders of Chen et al. (2012).


Unbiased Bayes for Big Data: Paths of Partial Posteriors

arXiv.org Machine Learning

A key quantity of interest in Bayesian inference are expectations of functions with respect to a posterior distribution. Markov Chain Monte Carlo is a fundamental tool to consistently compute these expectations via averaging samples drawn from an approximate posterior. However, its feasibility is being challenged in the era of so called Big Data as all data needs to be processed in every iteration. Realising that such simulation is an unnecessarily hard problem if the goal is estimation, we construct a computationally scalable methodology that allows unbiased estimation of the required expectations -- without explicit simulation from the full posterior. The scheme's variance is finite by construction and straightforward to control, leading to algorithms that are provably unbiased and naturally arrive at a desired error tolerance. This is achieved at an average computational complexity that is sub-linear in the size of the dataset and its free parameters are easy to tune. We demonstrate the utility and generality of the methodology on a range of common statistical models applied to large-scale benchmark and real-world datasets.