Goto

Collaborating Authors

 Statistical Learning


Binary Classifier Calibration: Non-parametric approach

arXiv.org Machine Learning

Accurate calibration of probabilistic predictive models learned is critical for many practical prediction and decision-making tasks. There are two main categories of methods for building calibrated classifiers. One approach is to develop methods for learning probabilistic models that are well-calibrated, ab initio. The other approach is to use some post-processing methods for transforming the output of a classifier to be well calibrated, as for example histogram binning, Platt scaling, and isotonic regression. One advantage of the post-processing approach is that it can be applied to any existing probabilistic classification model that was constructed using any machine-learning method. In this paper, we first introduce two measures for evaluating how well a classifier is calibrated. We prove three theorems showing that using a simple histogram binning post-processing method, it is possible to make a classifier be well calibrated while retaining its discrimination capability. Also, by casting the histogram binning method as a density-based non-parametric binary classifier, we can extend it using two simple non-parametric density estimation methods. We demonstrate the performance of the proposed calibration methods on synthetic and real datasets. Experimental results show that the proposed methods either outperform or are comparable to existing calibration methods.


Transductive Rademacher Complexity and its Applications

arXiv.org Artificial Intelligence

We develop a technique for deriving data-dependent error bounds for transductive learning algorithms based on transductive Rademacher complexity. Our technique is based on a novel general error bound for transduction in terms of transductive Rademacher complexity, together with a novel bounding technique for Rademacher averages for particular algorithms, in terms of their "unlabeled-labeled" representation. This technique is relevant to many advanced graph-based transductive algorithms and we demonstrate its effectiveness by deriving error bounds to three well known algorithms. Finally, we present a new PAC-Bayesian bound for mixtures of transductive algorithms based on our Rademacher bounds.


Interactive Policy Learning through Confidence-Based Autonomy

arXiv.org Artificial Intelligence

The CBA algorithm consists of two components which take advantage of the complimentary abilities of humans and computer agents. The first component, Confident Execution, enables the agent to identify states in which demonstration is required, to request a demonstration from the human teacher and to learn a policy based on the acquired data. The algorithm selects demonstrations based on a measure of action selection confidence, and our results show that using Confident Execution the agent requires fewer demonstrations to learn the policy than when demonstrations are selected by a human teacher. The second algorithmic component, Corrective Demonstration, enables the teacher to correct any mistakes made by the agent through additional demonstrations in order to improve the policy and future task performance. CBA and its individual components are compared and evaluated in a complex simulated driving domain.


Analogical Dissimilarity: Definition, Algorithms and Two Experiments in Machine Learning

arXiv.org Artificial Intelligence

This paper defines the notion of analogical dissimilarity between four objects, with a special focus on objects structured as sequences. Firstly, it studies the case where the four objects have a null analogical dissimilarity, i.e. are in analogical proportion. Secondly, when one of these objects is unknown, it gives algorithms to compute it. Thirdly, it tackles the problem of defining analogical dissimilarity, which is a measure of how far four objects are from being in analogical proportion. In particular, when objects are sequences, it gives a definition and an algorithm based on an optimal alignment of the four sequences. It gives also learning algorithms, i.e. methods to find the triple of objects in a learning sample which has the least analogical dissimilarity with a given object. Two practical experiments are described: the first is a classification problem on benchmarks of binary and nominal data, the second shows how the generation of sequences by solving analogical equations enables a handwritten character recognition system to rapidly be adapted to a new writer.


Binary Classifier Calibration: Bayesian Non-Parametric Approach

arXiv.org Machine Learning

A set of probabilistic predictions is well calibrated if the events that are predicted to occur with probability p do in fact occur about p fraction of the time. Well calibrated predictions are particularly important when machine learning models are used in decision analysis. This paper presents two new non-parametric methods for calibrating outputs of binary classification models: a method based on the Bayes optimal selection and a method based on the Bayesian model averaging. The advantage of these methods is that they are independent of the algorithm used to learn a predictive model, and they can be applied in a post-processing step, after the model is learned. This makes them applicable to a wide variety of machine learning models and methods. These calibration methods, as well as other methods, are tested on a variety of datasets in terms of both discrimination and calibration performance. The results show the methods either outperform or are comparable in performance to the state-of-the-art calibration methods.


Efficient Monte Carlo and greedy heuristic for the inference of stochastic block models

arXiv.org Machine Learning

We present an efficient algorithm for the inference of stochastic block models in large networks. The algorithm can be used as an optimized Markov chain Monte Carlo (MCMC) method, with a fast mixing time and a much reduced susceptibility to getting trapped in metastable states, or as a greedy agglomerative heuristic, with an almost linear $O(N\ln^2N)$ complexity, where $N$ is the number of nodes in the network, independent on the number of blocks being inferred. We show that the heuristic is capable of delivering results which are indistinguishable from the more exact and numerically expensive MCMC method in many artificial and empirical networks, despite being much faster. The method is entirely unbiased towards any specific mixing pattern, and in particular it does not favor assortative community structures.


lp-Recovery of the Most Significant Subspace among Multiple Subspaces with Outliers

arXiv.org Machine Learning

We assume data sampled from a mixture of d-dimensional linear subspaces with spherically symmetric distributions within each subspace and an additional outlier component with spherically symmetric distribution within the ambient space (for simplicity we may assume that all distributions are uniform on their corresponding unit spheres). We also assume mixture weights for the different components. We say that one of the underlying subspaces of the model is most significant if its mixture weight is higher than the sum of the mixture weights of all other subspaces. We study the recovery of the most significant subspace by minimizing the lp-averaged distances of data points from d-dimensional subspaces, where p>0. Unlike other lp minimization problems, this minimization is non-convex for all p>0 and thus requires different methods for its analysis. We show that if 01 and there is more than one underlying subspace, then with overwhelming probability the most significant subspace cannot be recovered or nearly recovered. This last result does not require spherically symmetric outliers.


Multi-Step-Ahead Time Series Prediction using Multiple-Output Support Vector Regression

arXiv.org Machine Learning

Accurate time series prediction over long future horizons is challenging and of great interest to both practitioners and academics. As a well-known intelligent algorithm, the standard formulation of Support Vector Regression (SVR) could be taken for multi-step-ahead time series prediction, only relying either on iterated strategy or direct strategy. This study proposes a novel multiple-step-ahead time series prediction approach which employs multiple-output support vector regression (M-SVR) with multiple-input multiple-output (MIMO) prediction strategy. In addition, the rank of three leading prediction strategies with SVR is comparatively examined, providing practical implications on the selection of the prediction strategy for multi-step-ahead forecasting while taking SVR as modeling technique. The proposed approach is validated with the simulated and real datasets. The quantitative and comprehensive assessments are performed on the basis of the prediction accuracy and computational cost. The results indicate that: 1) the M-SVR using MIMO strategy achieves the best accurate forecasts with accredited computational load, 2) the standard SVR using direct strategy achieves the second best accurate forecasts, but with the most expensive computational cost, and 3) the standard SVR using iterated strategy is the worst in terms of prediction accuracy, but with the least computational cost.


Multiscale Shrinkage and L\'evy Processes

arXiv.org Machine Learning

A new shrinkage-based construction is developed for a compressible vector $\boldsymbol{x}\in\mathbb{R}^n$, for cases in which the components of $\xv$ are naturally associated with a tree structure. Important examples are when $\xv$ corresponds to the coefficients of a wavelet or block-DCT representation of data. The method we consider in detail, and for which numerical results are presented, is based on increments of a gamma process. However, we demonstrate that the general framework is appropriate for many other types of shrinkage priors, all within the L\'{e}vy process family, with the gamma process a special case. Bayesian inference is carried out by approximating the posterior with samples from an MCMC algorithm, as well as by constructing a heuristic variational approximation to the posterior. We also consider expectation-maximization (EM) for a MAP (point) solution. State-of-the-art results are manifested for compressive sensing and denoising applications, the latter with spiky (non-Gaussian) noise.


Online Matrix Completion Through Nuclear Norm Regularisation

arXiv.org Machine Learning

It is the main goal of this paper to propose a novel method to perform matrix completion on-line. Motivated by a wide variety of applications, ranging from the design of recommender systems to sensor network localization through seismic data reconstruction, we consider the matrix completion problem when entries of the matrix of interest are observed gradually. Precisely, we place ourselves in the situation where the predictive rule should be refined incrementally, rather than recomputed from scratch each time the sample of observed entries increases. The extension of existing matrix completion methods to the sequential prediction context is indeed a major issue in the Big Data era, and yet little addressed in the literature. The algorithm promoted in this article builds upon the Soft Impute approach introduced in Mazumder et al. (2010). The major novelty essentially arises from the use of a randomised technique for both computing and updating the Singular Value Decomposition (SVD) involved in the algorithm. Though of disarming simplicity, the method proposed turns out to be very efficient, while requiring reduced computations. Several numerical experiments based on real datasets illustrating its performance are displayed, together with preliminary results giving it a theoretical basis.