Goto

Collaborating Authors

 Statistical Learning


Supersparse Linear Integer Models for Interpretable Classification

arXiv.org Machine Learning

Scoring systems are classification models that only require users to add, subtract and multiply a few meaningful numbers to make a prediction. These models are often used because they are practical and interpretable. In this paper, we introduce an off-the-shelf tool to create scoring systems that both accurate and interpretable, known as a Supersparse Linear Integer Model (SLIM). SLIM is a discrete optimization problem that minimizes the 0-1 loss to encourage a high level of accuracy, regularizes the L0-norm to encourage a high level of sparsity, and constrains coefficients to a set of interpretable values. We illustrate the practical and interpretable nature of SLIM scoring systems through applications in medicine and criminology, and show that they are are accurate and sparse in comparison to state-of-the-art classification models using numerical experiments.


Power System Parameters Forecasting Using Hilbert-Huang Transform and Machine Learning

arXiv.org Machine Learning

A novel hybrid data-driven approach is developed for forecasting power system parameters with the goal of increasing the efficiency of short-term forecasting studies for non-stationary time-series. The proposed approach is based on mode decomposition and a feature analysis of initial retrospective data using the Hilbert-Huang transform and machine learning algorithms. The random forests and gradient boosting trees learning techniques were examined. The decision tree techniques were used to rank the importance of variables employed in the forecasting models. The Mean Decrease Gini index is employed as an impurity function. The resulting hybrid forecasting models employ the radial basis function neural network and support vector regression. Apart from introduction and references the paper is organized as follows. The section 2 presents the background and the review of several approaches for short-term forecasting of power system parameters. In the third section a hybrid machine learning-based algorithm using Hilbert-Huang transform is developed for short-term forecasting of power system parameters. Fourth section describes the decision tree learning algorithms used for the issue of variables importance. Finally in section six the experimental results in the following electric power problems are presented: active power flow forecasting, electricity price forecasting and for the wind speed and direction forecasting.


A Permutation Approach for Selecting the Penalty Parameter in Penalized Model Selection

arXiv.org Machine Learning

The analysis of high dimensional data, in which the number of measured predictors is large and can exceed the number of samples, is an important and common problem in statistical applications. When samples are accompanied by a real or categorical response, data analysis typically includes model fitting with the aim of doing prediction or variable selection, or both. The goal of prediction is to derive a rule capable of accurately predicting the response of a new, unlabeled sample. The goal of variable selection is to select a (small) subset of the measured predictors whose individual or coordinated activity is significantly related to the response. In both cases, it is common to assume that the observed data arise from an underlying model that is sparse, in the sense that only a small subset of the predictors are related to the response. Whether sparsity is assumed, or viewed as a desirable feature of a model, analysis of high dimensional data is often carried out by penalized methods that produce models in which a relatively small subset of the available predictors are included. Popular penalized methods include the LASSO (Tibshirani, 1996), its numerous variations, and SCAD (Fan and Li, 2001). In what follows, we focus our attention on the LASSO. The LASSO and its variants require specification of a penalty/tuning parameter that controls the tradeoff between model fit and model size.


Minimum $n$-Rank Approximation via Iterative Hard Thresholding

arXiv.org Machine Learning

The problem of recovering a low $n$-rank tensor is an extension of sparse recovery problem from the low dimensional space (matrix space) to the high dimensional space (tensor space) and has many applications in computer vision and graphics such as image inpainting and video inpainting. In this paper, we consider a new tensor recovery model, named as minimum $n$-rank approximation (MnRA), and propose an appropriate iterative hard thresholding algorithm with giving the upper bound of the $n$-rank in advance. The convergence analysis of the proposed algorithm is also presented. Particularly, we show that for the noiseless case, the linear convergence with rate $\frac{1}{2}$ can be obtained for the proposed algorithm under proper conditions. Additionally, combining an effective heuristic for determining $n$-rank, we can also apply the proposed algorithm to solve MnRA when $n$-rank is unknown in advance. Some preliminary numerical results on randomly generated and real low $n$-rank tensor completion problems are reported, which show the efficiency of the proposed algorithms.


Probabilistic Archetypal Analysis

arXiv.org Machine Learning

Archetypal analysis (AA) represents observations as composition of pure patterns, i.e., archetypes, or equivalently convex combinations of extreme values (Cutler and Breiman, 1994). Although AA bears resemblance with many well established prototypical analysis tools, such as principal component analysis (PCA, Mohamed et al, 2009), nonnegative matrix factorization (NMF, F evotte and Idier, 2011), probabilistic latent semantic analysis (Hofmann, 2013), andk -means (Steinley, 2006); AA is arguably unique, both conceptually and computationally . Conceptually, AA imitates the human tendency of representing a group of objects by its extreme elements (Davis and Love, 2010): this makes AA an interesting exploratory tool for applied scientists (e.g., Eugster, 2012; Seiler and Wohlrabe, 2013). Computationally, AA is data-driven, and requires the factors to be probability vectors: these make AA a computationally demanding tool, yet brings better interpretability . The concept of AA was originally formulated by Cutler and Breiman (1994).


Pseudo-Marginal Bayesian Inference for Gaussian Processes

arXiv.org Machine Learning

The main challenges that arise when adopting Gaussian Process priors in probabilistic modeling are how to carry out exact Bayesian inference and how to account for uncertainty on model parameters when making model-based predictions on out-of-sample data. Using probit regression as an illustrative working example, this paper presents a general and effective methodology based on the pseudo-marginal approach to Markov chain Monte Carlo that efficiently addresses both of these issues. The results presented in this paper show improvements over existing sampling methods to simulate from the posterior distribution over the parameters defining the covariance function of the Gaussian Process prior. This is particularly important as it offers a powerful tool to carry out full Bayesian inference of Gaussian Process based hierarchic statistical models in general. The results also demonstrate that Monte Carlo based integration of all model parameters is actually feasible in this class of models providing a superior quantification of uncertainty in predictions. Extensive comparisons with respect to state-of-the-art probabilistic classifiers confirm this assertion.


Ensemble Committees for Stock Return Classification and Prediction

arXiv.org Machine Learning

This paper considers a portfolio trading strategy formulated by algorithms in the field of machine learning. The profitability of the strategy is measured by the algorithm's capability to consistently and accurately identify stock indices with positive or negative returns, and to generate a preferred portfolio allocation on the basis of a learned model. Stocks are characterized by time series data sets consisting of technical variables that reflect market conditions in a previous time interval, which are utilized produce binary classification decisions in subsequent intervals. The learned model is constructed as a committee of random forest classifiers, a nonlinear support vector machine classifier, a relevance vector machine classifier, and a constituent ensemble of k-nearest neighbors classifiers. This selection of algorithms is appealing for two reasons: first, there is strikingly little research in economic time-series forecasting that employs learners beyond neural networks and clustering algorithms, and this construction offers a viable alternative; second, this selection incorporates an array of techniques that have both theoretically optimal classification properties and high empirical success rates in areas outside of finance, in addition to offering a mixture of parametric and nonparametric models. The ensemble committee is augmented by a boosting meta-algorithm and feature selection is performed by a supervised Relief algorithm. The Global Industry Classification Standard (GICS) is used to explore the ensemble model's efficacy within the context of various fields of investment including Energy, Materials, Financials, and Information Technology. Data from 2006 to 2012, inclusive, are considered, which are chosen for providing a range of market circumstances for evaluating the model. The model is observed to achieve an accuracy of approximately 70% when predicting stock price returns three months in advance.


GRADE: Machine Learning Support for Graduate Admissions

AI Magazine

This article describes GRADE, a statistical machine learning system developed to support the work of the graduate admissions committee at the University of Texas at Austin Department of Computer Science (UTCS). In recent years, the number of applications to the UTCS PhD program has become too large to manage with a traditional review process. GRADE uses historical admissions data to predict how likely the committee is to admit each new applicant. It reports each prediction as a score similar to those used by human reviewers, and accompanies each by an explanation of what applicant features most influenced its prediction. GRADE makes the review process more efficient by enabling reviewers to spend most of their time on applicants near the decision boundary and by focusing their attention on parts of each applicant’s file that matter the most. An evaluation over two seasons of PhD admissions indicates that the system leads to dramatic time savings, reducing the total time spent on reviews by at least 74 percent.


Using Analogy to Cluster Hand-Drawn Sketches for Sketch-Based Educational Software

AI Magazine

One of the major challenges to building intelligent educational software is determining what kinds of feedback to give learners. Useful feedback makes use of models of domain-specific knowledge, especially models that are commonly held by potential students. To empirically determine what these models are, student data can be clustered to reveal common misconceptions or common problem-solving strategies. This article describes how analogical retrieval and generalization can be used to cluster automatically analyzed hand-drawn sketches incorporating both spatial and conceptual information. We use this approach to cluster a corpus of hand-drawn student sketches to discover common answers. Common answer clusters can be used for the design of targeted feedback and for assessment.


A Tutorial on Principal Component Analysis

arXiv.org Machine Learning

Principal component analysis (PCA) is a standard tool in modern data analysis - in diverse fields from neuroscience to computer graphics - because it is a simple, nonparametric method for extracting relevant information from confusing data sets. With minimal effort PCA provides a roadmap for how to reduce a complex data set to a lower dimension to reveal the sometimes hidden, simplified structures that often underlie it. The goal of this tutorial is to provide both an intuitive feel for PCA, and a thorough discussion of this topic. We will begin with a simple example and provide an intuitive explanation of the goal of PCA. We will continue by adding mathematical rigor to place it within the framework of linear algebra to provide an explicit solution.