Goto

Collaborating Authors

 Accuracy


Robust Logistic Regression using Shift Parameters (Long Version)

arXiv.org Artificial Intelligence

Almost any large dataset has annotation errors, especially those complex, nuanced datasets commonly used in natural language processing. Low-quality annotations have become even more common in recent years with the rise of Amazon Mechanical Turk, as well as methods like distant supervision and co-training that involve automatically generating training data. Although small amounts of noise may not be detrimental, in some applications the level can be high: upon manually inspecting a relation extraction corpus commonly used in distant supervision, Riedel et al. (2010) report a 31% false positive rate. In cases like these, annotation errors have frequently been observed to hurt performance. Dingare et al. (2005), for example, conduct error analysis on a system to extract relations from biomedical text, and observe that over half of the system's errors could be attributed to inconsistencies in how the data was annotated. Similarly, in a case study on co-training for natural language tasks, Pierce and Cardie (2001) find that the degradation in data quality from automatic labelling prevents these systems from performing comparably to their fully-supervised counterparts.


Classifying pairs with trees for supervised biological network inference

arXiv.org Machine Learning

Networks are ubiquitous in biology and computational approaches have been largely investigated for their inference. In particular, supervised machine learning methods can be used to complete a partially known network by integrating various measurements. Two main supervised frameworks have been proposed: the local approach, which trains a separate model for each network node, and the global approach, which trains a single model over pairs of nodes. Here, we systematically investigate, theoretically and empirically, the exploitation of tree-based ensemble methods in the context of these two approaches for biological network inference. We first formalize the problem of network inference as classification of pairs, unifying in the process homogeneous and bipartite graphs and discussing two main sampling schemes. We then present the global and the local approaches, extending the later for the prediction of interactions between two unseen network nodes, and discuss their specializations to tree-based ensemble methods, highlighting their interpretability and drawing links with clustering techniques. Extensive computational experiments are carried out with these methods on various biological networks that clearly highlight that these methods are competitive with existing methods.


A Study on Stroke Rehabilitation through Task-Oriented Control of a Haptic Device via Near-Infrared Spectroscopy-Based BCI

arXiv.org Machine Learning

This paper presents a study in task-oriented approach to stroke rehabilitation by controlling a haptic device via near-infrared spectroscopy-based brain-computer interface (BCI). The task is to command the haptic device to move in opposing directions of leftward and rightward movement. Our study consists of data acquisition, signal preprocessing, and classification. In data acquisition, we conduct experiments based on two different mental tasks: one on pure motor imagery, and another on combined motor imagery and action observation. The experiments were conducted in both offline and online modes. In the signal preprocessing, we use localization method to eliminate channels that are irrelevant to the mental task, as well as perform feature extraction for subsequent classification. We propose multiple support vector machine classifiers with a majority-voting scheme for improved classification results. And lastly, we present test results to demonstrate the efficacy of our proposed approach to possible stroke rehabilitation practice.


Supersparse Linear Integer Models for Interpretable Classification

arXiv.org Machine Learning

Scoring systems are classification models that only require users to add, subtract and multiply a few meaningful numbers to make a prediction. These models are often used because they are practical and interpretable. In this paper, we introduce an off-the-shelf tool to create scoring systems that both accurate and interpretable, known as a Supersparse Linear Integer Model (SLIM). SLIM is a discrete optimization problem that minimizes the 0-1 loss to encourage a high level of accuracy, regularizes the L0-norm to encourage a high level of sparsity, and constrains coefficients to a set of interpretable values. We illustrate the practical and interpretable nature of SLIM scoring systems through applications in medicine and criminology, and show that they are are accurate and sparse in comparison to state-of-the-art classification models using numerical experiments.


Efficiency of conformalized ridge regression

arXiv.org Machine Learning

Conformal prediction is a method of producing prediction sets that can be applied on top of a wide range of prediction algorithms. The method has a guaranteed coverage probability under the standard IID assumption regardless of whether the assumptions (often considerably more restrictive) of the underlying algorithm are satisfied. However, for the method to be really useful it is desirable that in the case where the assumptions of the underlying algorithm are satisfied, the conformal predictor loses little in efficiency as compared with the underlying algorithm (whereas being a conformal predictor, it has the stronger guarantee of validity). In this paper we explore the degree to which this additional requirement of efficiency is satisfied in the case of Bayesian ridge regression; we find that asymptotically conformal prediction sets differ little from ridge regression prediction intervals when the standard Bayesian assumptions are satisfied.


A Permutation Approach for Selecting the Penalty Parameter in Penalized Model Selection

arXiv.org Machine Learning

The analysis of high dimensional data, in which the number of measured predictors is large and can exceed the number of samples, is an important and common problem in statistical applications. When samples are accompanied by a real or categorical response, data analysis typically includes model fitting with the aim of doing prediction or variable selection, or both. The goal of prediction is to derive a rule capable of accurately predicting the response of a new, unlabeled sample. The goal of variable selection is to select a (small) subset of the measured predictors whose individual or coordinated activity is significantly related to the response. In both cases, it is common to assume that the observed data arise from an underlying model that is sparse, in the sense that only a small subset of the predictors are related to the response. Whether sparsity is assumed, or viewed as a desirable feature of a model, analysis of high dimensional data is often carried out by penalized methods that produce models in which a relatively small subset of the available predictors are included. Popular penalized methods include the LASSO (Tibshirani, 1996), its numerous variations, and SCAD (Fan and Li, 2001). In what follows, we focus our attention on the LASSO. The LASSO and its variants require specification of a penalty/tuning parameter that controls the tradeoff between model fit and model size.


GRADE: Machine Learning Support for Graduate Admissions

AI Magazine

This article describes GRADE, a statistical machine learning system developed to support the work of the graduate admissions committee at the University of Texas at Austin Department of Computer Science (UTCS). In recent years, the number of applications to the UTCS PhD program has become too large to manage with a traditional review process. GRADE uses historical admissions data to predict how likely the committee is to admit each new applicant. It reports each prediction as a score similar to those used by human reviewers, and accompanies each by an explanation of what applicant features most influenced its prediction. GRADE makes the review process more efficient by enabling reviewers to spend most of their time on applicants near the decision boundary and by focusing their attention on parts of each applicantโ€™s file that matter the most. An evaluation over two seasons of PhD admissions indicates that the system leads to dramatic time savings, reducing the total time spent on reviews by at least 74 percent.


Information-Theoretic Multi-view Domain Adaptation: A Theoretical and Empirical Study

Journal of Artificial Intelligence Research

Multi-view learning aims to improve classification performance by leveraging the consistency among different views of data. The incorporation of multiple views was paid little attention in the studies of domain adaptation, where the view consistency based on source data is largely violated in the target domain due to the distribution gap between different domain data. In this paper, we leverage multiple views for cross-domain document classification. The central idea is to strengthen the views' consistency on target data by identifying the associations of domain-specific features from different domains. We present an Information-theoretic Multi-view Adaptation Model (IMAM) using a multi-way clustering scheme, where word and link clusters can draw together seemingly unrelated features across domains, which boosts the consistency between document clusterings that are based on the respective word and link views. Moreover, we demonstrate that IMAM can always find the document clustering with the minimal disagreement rate to the overlap of view-based clusterings. We provide both theoretical and empirical justifications of the proposed method. Our experiments show that IMAM significantly outperforms traditional multi-view algorithm co-training, the co-training-based adaptation algorithm CODA, the single-view transfer model CoCC and the large-margin-based multi-view transfer model MVTL-LM.


Disease Prediction based on Functional Connectomes using a Scalable and Spatially-Informed Support Vector Machine

arXiv.org Machine Learning

Substantial evidence indicates that major psychiatric disorders are associated with distributed neural dysconnectivity, leading to strong interest in using neuroimaging methods to accurately predict disorder status. In this work, we are specifically interested in a multivariate approach that uses features derived from whole-brain resting state functional connectomes. However, functional connectomes reside in a high dimensional space, which complicates model interpretation and introduces numerous statistical and computational challenges. Traditional feature selection techniques are used to reduce data dimensionality, but are blind to the spatial structure of the connectomes. We propose a regularization framework where the 6-D structure of the functional connectome is explicitly taken into account via the fused Lasso or the GraphNet regularizer. Our method only restricts the loss function to be convex and margin-based, allowing non-differentiable loss functions such as the hinge-loss to be used. Using the fused Lasso or GraphNet regularizer with the hinge-loss leads to a structured sparse support vector machine (SVM) with embedded feature selection. We introduce a novel efficient optimization algorithm based on the augmented Lagrangian and the classical alternating direction method, which can solve both fused Lasso and GraphNet regularized SVM with very little modification. We also demonstrate that the inner subproblems of the algorithm can be solved efficiently in analytic form by coupling the variable splitting strategy with a data augmentation scheme. Experiments on simulated data and resting state scans from a large schizophrenia dataset show that our proposed approach can identify predictive regions that are spatially contiguous in the 6-D "connectome space," offering an additional layer of interpretability that could provide new insights about various disease processes.


Random design analysis of ridge regression

arXiv.org Artificial Intelligence

This work gives a simultaneous analysis of both the ordinary least squares estimator and the ridge regression estimator in the random design setting under mild assumptions on the covariate/response distributions. In particular, the analysis provides sharp results on the ``out-of-sample'' prediction error, as opposed to the ``in-sample'' (fixed design) error. The analysis also reveals the effect of errors in the estimated covariance structure, as well as the effect of modeling errors, neither of which effects are present in the fixed design setting. The proofs of the main results are based on a simple decomposition lemma combined with concentration inequalities for random vectors and matrices.