AITopics | Statistical Learning

Collaborating Authors

Statistical Learning

News Overviews Instructional Materials AI-Alerts Classics

Supersparse Linear Integer Models for Interpretable Classification

Ustun, Berk, Tracà, Stefano, Rudin, Cynthia

arXiv.org Machine LearningApr-10-2014

Scoring systems are classification models that only require users to add, subtract and multiply a few meaningful numbers to make a prediction. These models are often used because they are practical and interpretable. In this paper, we introduce an off-the-shelf tool to create scoring systems that both accurate and interpretable, known as a Supersparse Linear Integer Model (SLIM). SLIM is a discrete optimization problem that minimizes the 0-1 loss to encourage a high level of accuracy, regularizes the L0-norm to encourage a high level of sparsity, and constrains coefficients to a set of interpretable values. We illustrate the practical and interpretable nature of SLIM scoring systems through applications in medicine and criminology, and show that they are are accurate and sparse in comparison to state-of-the-art classification models using numerical experiments.

artificial intelligence, coefficient, machine learning, (19 more...)

arXiv.org Machine Learning

1306.6677

Country: North America > United States (1.00)

Genre: Research Report (1.00)

Industry:

Law (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Health Care Providers & Services (0.93)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)

Add feedback

Power System Parameters Forecasting Using Hilbert-Huang Transform and Machine Learning

Kurbatsky, Victor, Tomin, Nikita, Spiryaev, Vadim, Leahy, Paul, Sidorov, Denis, Zhukov, Alexei

arXiv.org Machine LearningApr-8-2014

A novel hybrid data-driven approach is developed for forecasting power system parameters with the goal of increasing the efficiency of short-term forecasting studies for non-stationary time-series. The proposed approach is based on mode decomposition and a feature analysis of initial retrospective data using the Hilbert-Huang transform and machine learning algorithms. The random forests and gradient boosting trees learning techniques were examined. The decision tree techniques were used to rank the importance of variables employed in the forecasting models. The Mean Decrease Gini index is employed as an impurity function. The resulting hybrid forecasting models employ the radial basis function neural network and support vector regression. Apart from introduction and references the paper is organized as follows. The section 2 presents the background and the review of several approaches for short-term forecasting of power system parameters. In the third section a hybrid machine learning-based algorithm using Hilbert-Huang transform is developed for short-term forecasting of power system parameters. Fourth section describes the decision tree learning algorithms used for the issue of variables importance. Finally in section six the experimental results in the following electric power problems are presented: active power flow forecasting, electricity price forecasting and for the wind speed and direction forecasting.

artificial intelligence, forecasting, machine learning, (14 more...)

arXiv.org Machine Learning

1404.2353

Country:

Europe (1.00)
North America (0.70)
Asia > Russia (0.29)

Genre: Research Report (0.50)

Industry: Energy > Power Industry (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.87)

Add feedback

A Permutation Approach for Selecting the Penalty Parameter in Penalized Model Selection

Sabourin, Jeremy, Valdar, William, Nobel, Andrew

arXiv.org Machine LearningApr-8-2014

The analysis of high dimensional data, in which the number of measured predictors is large and can exceed the number of samples, is an important and common problem in statistical applications. When samples are accompanied by a real or categorical response, data analysis typically includes model fitting with the aim of doing prediction or variable selection, or both. The goal of prediction is to derive a rule capable of accurately predicting the response of a new, unlabeled sample. The goal of variable selection is to select a (small) subset of the measured predictors whose individual or coordinated activity is significantly related to the response. In both cases, it is common to assume that the observed data arise from an underlying model that is sparse, in the sense that only a small subset of the predictors are related to the response. Whether sparsity is assumed, or viewed as a desirable feature of a model, analysis of high dimensional data is often carried out by penalized methods that produce models in which a relatively small subset of the available predictors are included. Popular penalized methods include the LASSO (Tibshirani, 1996), its numerous variations, and SCAD (Fan and Li, 2001). In what follows, we focus our attention on the LASSO. The LASSO and its variants require specification of a penalty/tuning parameter that controls the tradeoff between model fit and model size.

artificial intelligence, machine learning, selection, (18 more...)

arXiv.org Machine Learning

1404.2007

Country: North America > United States > North Carolina > Orange County > Chapel Hill (0.14)

Genre: Research Report > Experimental Study (0.48)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

Add feedback

Minimum $n$-Rank Approximation via Iterative Hard Thresholding

Zhang, Min, Yang, Lei, Huang, Zheng-Hai

arXiv.org Machine LearningApr-8-2014

The problem of recovering a low $n$-rank tensor is an extension of sparse recovery problem from the low dimensional space (matrix space) to the high dimensional space (tensor space) and has many applications in computer vision and graphics such as image inpainting and video inpainting. In this paper, we consider a new tensor recovery model, named as minimum $n$-rank approximation (MnRA), and propose an appropriate iterative hard thresholding algorithm with giving the upper bound of the $n$-rank in advance. The convergence analysis of the proposed algorithm is also presented. Particularly, we show that for the noiseless case, the linear convergence with rate $\frac{1}{2}$ can be obtained for the proposed algorithm under proper conditions. Additionally, combining an effective heuristic for determining $n$-rank, we can also apply the proposed algorithm to solve MnRA when $n$-rank is unknown in advance. Some preliminary numerical results on randomly generated and real low $n$-rank tensor completion problems are reported, which show the efficiency of the proposed algorithms.

algorithm, artificial intelligence, machine learning, (12 more...)

arXiv.org Machine Learning

1311.4291

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Probabilistic Archetypal Analysis

Seth, Sohan, Eugster, Manuel J. A.

arXiv.org Machine LearningApr-7-2014

Archetypal analysis (AA) represents observations as composition of pure patterns, i.e., archetypes, or equivalently convex combinations of extreme values (Cutler and Breiman, 1994). Although AA bears resemblance with many well established prototypical analysis tools, such as principal component analysis (PCA, Mohamed et al, 2009), nonnegative matrix factorization (NMF, F evotte and Idier, 2011), probabilistic latent semantic analysis (Hofmann, 2013), andk -means (Steinley, 2006); AA is arguably unique, both conceptually and computationally . Conceptually, AA imitates the human tendency of representing a group of objects by its extreme elements (Davis and Love, 2010): this makes AA an interesting exploratory tool for applied scientists (e.g., Eugster, 2012; Seiler and Wohlrabe, 2013). Computationally, AA is data-driven, and requires the factors to be probability vectors: these make AA a computationally demanding tool, yet brings better interpretability . The concept of AA was originally formulated by Cutler and Breiman (1994).

archetypal analysis, archetypal profile, archetype, (13 more...)

arXiv.org Machine Learning

1312.7604

Country:

North America > United States (0.14)
Asia > Middle East > Jordan (0.04)
Asia > China (0.04)
(17 more...)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Sports (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)

Add feedback

Pseudo-Marginal Bayesian Inference for Gaussian Processes

Filippone, Maurizio, Girolami, Mark

arXiv.org Machine LearningApr-7-2014

The main challenges that arise when adopting Gaussian Process priors in probabilistic modeling are how to carry out exact Bayesian inference and how to account for uncertainty on model parameters when making model-based predictions on out-of-sample data. Using probit regression as an illustrative working example, this paper presents a general and effective methodology based on the pseudo-marginal approach to Markov chain Monte Carlo that efficiently addresses both of these issues. The results presented in this paper show improvements over existing sampling methods to simulate from the posterior distribution over the parameters defining the covariance function of the Gaussian Process prior. This is particularly important as it offers a powerful tool to carry out full Bayesian inference of Gaussian Process based hierarchic statistical models in general. The results also demonstrate that Monte Carlo based integration of all model parameters is actually feasible in this class of models providing a superior quantification of uncertainty in predictions. Extensive comparisons with respect to state-of-the-art probabilistic classifiers confirm this assertion.

approximation, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

1310.074

Country: North America > United States (1.00)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)

Add feedback

Ensemble Committees for Stock Return Classification and Prediction

Brofos, James

arXiv.org Machine LearningApr-5-2014

This paper considers a portfolio trading strategy formulated by algorithms in the field of machine learning. The profitability of the strategy is measured by the algorithm's capability to consistently and accurately identify stock indices with positive or negative returns, and to generate a preferred portfolio allocation on the basis of a learned model. Stocks are characterized by time series data sets consisting of technical variables that reflect market conditions in a previous time interval, which are utilized produce binary classification decisions in subsequent intervals. The learned model is constructed as a committee of random forest classifiers, a nonlinear support vector machine classifier, a relevance vector machine classifier, and a constituent ensemble of k-nearest neighbors classifiers. This selection of algorithms is appealing for two reasons: first, there is strikingly little research in economic time-series forecasting that employs learners beyond neural networks and clustering algorithms, and this construction offers a viable alternative; second, this selection incorporates an array of techniques that have both theoretically optimal classification properties and high empirical success rates in areas outside of finance, in addition to offering a mixture of parametric and nonparametric models. The ensemble committee is augmented by a boosting meta-algorithm and feature selection is performed by a supervised Relief algorithm. The Global Industry Classification Standard (GICS) is used to explore the ensemble model's efficacy within the context of various fields of investment including Energy, Materials, Financials, and Information Technology. Data from 2006 to 2012, inclusive, are considered, which are chosen for providing a range of market circumstances for evaluating the model. The model is observed to achieve an accuracy of approximately 70% when predicting stock price returns three months in advance.

algorithm, artificial intelligence, machine learning, (19 more...)

arXiv.org Machine Learning

1404.1492

Genre: Research Report (0.50)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.70)

Add feedback

GRADE: Machine Learning Support for Graduate Admissions

Waters, Austin (University of Texas at Austin) | Miikkulainen, Risto (University of Texas at Austin)

AI MagazineApr-4-2014

This article describes GRADE, a statistical machine learning system developed to support the work of the graduate admissions committee at the University of Texas at Austin Department of Computer Science (UTCS). In recent years, the number of applications to the UTCS PhD program has become too large to manage with a traditional review process. GRADE uses historical admissions data to predict how likely the committee is to admit each new applicant. It reports each prediction as a score similar to those used by human reviewers, and accompanies each by an explanation of what applicant features most influenced its prediction. GRADE makes the review process more efficient by enabling reviewers to spend most of their time on applicants near the decision boundary and by focusing their attention on parts of each applicant’s file that matter the most. An evaluation over two seasons of PhD admissions indicates that the system leads to dramatic time savings, reducing the total time spent on reviews by at least 74 percent.

applicant, artificial intelligence, machine learning, (18 more...)

AI Magazine

Country: North America > United States > Texas > Travis County > Austin (0.34)

Industry: Education > Educational Setting > Higher Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Using Analogy to Cluster Hand-Drawn Sketches for Sketch-Based Educational Software

Chang, Maria D. (Northwestern University) | Forbus, Kenneth D. (Northwestern University)

AI MagazineApr-4-2014

One of the major challenges to building intelligent educational software is determining what kinds of feedback to give learners. Useful feedback makes use of models of domain-specific knowledge, especially models that are commonly held by potential students. To empirically determine what these models are, student data can be clustered to reveal common misconceptions or common problem-solving strategies. This article describes how analogical retrieval and generalization can be used to cluster automatically analyzed hand-drawn sketches incorporating both spatial and conceptual information. We use this approach to cluster a corpus of hand-drawn student sketches to discover common answers. Common answer clusters can be used for the design of targeted feedback and for assessment.

machine learning, natural language, sketch, (20 more...)

AI Magazine

Country: North America > United States > California (0.28)

Genre: Research Report (0.47)

Industry:

Education > Educational Setting (1.00)
Education > Curriculum > Subject-Specific Education (0.94)
Education > Educational Technology > Educational Software > Computer Based Training (0.47)

Technology:

Information Technology > Artificial Intelligence > Vision > Sketch Understanding (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
(2 more...)

Add feedback

A Tutorial on Principal Component Analysis

Shlens, Jonathon

arXiv.org Machine LearningApr-3-2014

Principal component analysis (PCA) is a standard tool in modern data analysis - in diverse fields from neuroscience to computer graphics - because it is a simple, nonparametric method for extracting relevant information from confusing data sets. With minimal effort PCA provides a roadmap for how to reduce a complex data set to a lower dimension to reveal the sometimes hidden, simplified structures that often underlie it. The goal of this tutorial is to provide both an intuitive feel for PCA, and a thorough discussion of this topic. We will begin with a simple example and provide an intuitive explanation of the goal of PCA. We will continue by adding mathematical rigor to place it within the framework of linear algebra to provide an explicit solution.

artificial intelligence, machine learning, matrix, (18 more...)

arXiv.org Machine Learning

1404.11

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Industry: Education (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.61)

Add feedback