AITopics | Regression

Collaborating Authors

Regression

News Overviews Instructional Materials AI-Alerts Classics

Recovery of simultaneous low rank and two-way sparse coefficient matrices, a nonconvex approach

Yu, Ming, Wang, Zhaoran, Gupta, Varun, Kolar, Mladen

arXiv.org Machine LearningFeb-19-2018

We study the problem of recovery of matrices that are simultaneously low rank and row and/or column sparse. Such matrices appear in recent applications in cognitive neuroscience, imaging, computer vision, macroeconomics, and genetics. We propose a GDT (Gradient Descent with hard Thresholding) algorithm to efficiently recover matrices with such structure, by minimizing a bi-convex function over a nonconvex set of constraints. We show linear convergence of the iterates obtained by GDT to a region within statistical error of an optimal solution. As an application of our method, we consider multi-task learning problems and show that the statistical error rate obtained by GDT is near optimal compared to minimax rate. Experiments demonstrate competitive performance and much faster running speed compared to existing methods, on both simulations and real data sets.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Machine Learning

1802.06967

Country:

Europe (0.93)
North America > United States > New York (0.28)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.45)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

Estimator of Prediction Error Based on Approximate Message Passing for Penalized Linear Regression

Sakata, Ayaka

arXiv.org Machine LearningFeb-19-2018

We propose an estimator of prediction error using an approximate message passing (AMP) algorithm that can be applied to a broad range of sparse penalties. Following Stein's lemma, the estimator of the generalized degrees of freedom, which is a key quantity for the construction of the estimator of the prediction error, is calculated at the AMP fixed point. The resulting form of the AMPbased estimator does not depend on the penalty function, and its value can be further improved by considering the correlation between predictors. The proposed estimator is asymptotically unbiased when the components of the predictors and response variables are independently generated according to a Gaussian distribution. We examine the behaviour of the estimator for real data under nonconvex sparse penalties, where Akaike's information criterion does not correspond to an unbiased estimator of the prediction error. The model selected by the proposed estimator is close to that which minimizes the true prediction error. In recent decades, variable selection using sparse penalties, referred to here as sparse estimation, has become an attractive estimation scheme [1, 2, 3]. The sparse estimation is mathematically formulated as the minimization of the estimating function associated with the sparse penalties. In this paper, we concentrate on the linear regression problem with an arbitrary sparse regularization.

artificial intelligence, machine learning, modeling & simulation, (15 more...)

arXiv.org Machine Learning

1802.06939

Country:

North America > United States (0.46)
Asia > Japan (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.71)

Add feedback

Classifier Risk Estimation under Limited Labeling Resources

Kumar, Anurag, Raj, Bhiksha

arXiv.org Machine LearningFeb-19-2018

In this paper we propose strategies for estimating performance of a classifier when labels cannot be obtained for the whole test set. The number of test instances which can be labeled is very small compared to the whole test data size. The goal then is to obtain a precise estimate of classifier performance using as little labeling resource as possible. Specifically, we try to answer, how to select a subset of the large test set for labeling such that the performance of a classifier estimated on this subset is as close as possible to the one on the whole test set. We propose strategies based on stratified sampling for selecting this subset. We show that these strategies can reduce the variance in estimation of classifier accuracy by a significant amount compared to simple random sampling (over 65% in several cases). Hence, our proposed methods are much more precise compared to random sampling for accuracy estimation under restricted labeling resources. The reduction in number of samples required (compared to random sampling) to estimate the classifier accuracy with only 1% error is high as 60% in some cases.

artificial intelligence, machine learning, variance, (19 more...)

arXiv.org Machine Learning

1607.02665

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Approximate message passing for nonconvex sparse regularization with stability and asymptotic analysis

Sakata, Ayaka, Xu, Yingying

arXiv.org Machine LearningFeb-18-2018

We analyse a linear regression problem with nonconvex regularization called smoothly clipped absolute deviation (SCAD) under an overcomplete Gaussian basis for Gaussian random data. We propose an approximate message passing (AMP) algorithm considering nonconvex regularization, namely SCAD-AMP, and analytically show that the stability condition corresponds to the de Almeida--Thouless condition in spin glass literature. Through asymptotic analysis, we show the correspondence between the density evolution of SCAD-AMP and the replica symmetric solution. Numerical experiments confirm that for a sufficiently large system size, SCAD-AMP achieves the optimal performance predicted by the replica method. Through replica analysis, a phase transition between replica symmetric (RS) and replica symmetry breaking (RSB) region is found in the parameter space of SCAD. The appearance of the RS region for a nonconvex penalty is a significant advantage that indicates the region of smooth landscape of the optimization problem. Furthermore, we analytically show that the statistical representation performance of the SCAD penalty is better than that of L1-based methods, and the minimum representation error under RS assumption is obtained at the edge of the RS/RSB phase. The correspondence between the convergence of the existing coordinate descent algorithm and RS/RSB transition is also indicated.

artificial intelligence, machine learning, regularization, (19 more...)

arXiv.org Machine Learning

1711.02795

Country:

Europe (0.14)
Asia > Japan (0.14)

Genre: Research Report (0.81)

Industry: Energy > Oil & Gas (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Add feedback

Robust Estimation via Robust Gradient Estimation

Prasad, Adarsh, Suggala, Arun Sai, Balakrishnan, Sivaraman, Ravikumar, Pradeep

arXiv.org Machine LearningFeb-18-2018

We provide a new computationally-efficient class of estimators for risk minimization. We show that these estimators are robust for general statistical models: in the classical Huber epsilon-contamination model and in heavy-tailed settings. Our workhorse is a novel robust variant of gradient descent, and we provide conditions under which our gradient descent variant provides accurate estimators in a general convex risk minimization problem. We provide specific consequences of our theory for linear regression, logistic regression and for estimation of the canonical parameters in an exponential family. These results provide some of the first computationally tractable and provably robust estimators for these canonical statistical models. Finally, we study the empirical performance of our proposed methods on synthetic and real datasets, and find that our methods convincingly outperform a variety of baselines.

artificial intelligence, estimator, machine learning, (14 more...)

arXiv.org Machine Learning

1802.06485

Country:

North America > United States (0.45)
Europe (0.28)

Genre: Research Report > New Finding (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.88)

Add feedback

Local Geometry of One-Hidden-Layer Neural Networks for Logistic Regression

Fu, Haoyu, Chi, Yuejie, Liang, Yingbin

arXiv.org Machine LearningFeb-18-2018

We study the local geometry of a one-hidden-layer fully-connected neural network where the training samples are generated from a multi-neuron logistic regression model. We prove that under Gaussian input, the empirical risk function employing quadratic loss exhibits strong convexity and smoothness uniformly in a local neighborhood of the ground truth, for a class of smooth activation functions satisfying certain properties, including sigmoid and tanh, as soon as the sample complexity is sufficiently large. This implies that if initialized in this neighborhood, gradient descent converges linearly to a critical point that is provably close to the ground truth without requiring a fresh set of samples at each iteration. This significantly improves upon prior results on learning shallow neural networks with multiple neurons. To the best of our knowledge, this is the first global convergence guarantee for one-hidden-layer neural networks using gradient descent over the empirical risk function without resampling at the near-optimal sampling and computational complexity.

activation function, artificial intelligence, machine learning, (15 more...)

arXiv.org Machine Learning

1802.06463

Country: North America > United States (0.67)

Genre:

Research Report > New Finding (0.70)
Research Report > Experimental Study (0.60)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.57)

Add feedback

Restricted Eigenvalue from Stable Rank with Applications to Sparse Linear Regression

Kasiviswanathan, Shiva Prasad, Rudelson, Mark

arXiv.org Machine LearningFeb-17-2018

High-dimensional settings, where the data dimension ($d$) far exceeds the number of observations ($n$), are common in many statistical and machine learning applications. Methods based on $\ell_1$-relaxation, such as Lasso, are very popular for sparse recovery in these settings. Restricted Eigenvalue (RE) condition is among the weakest, and hence the most general, condition in literature imposed on the Gram matrix that guarantees nice statistical properties for the Lasso estimator. It is natural to ask: what families of matrices satisfy the RE condition? Following a line of work in this area, we construct a new broad ensemble of dependent random design matrices that have an explicit RE bound. Our construction starts with a fixed (deterministic) matrix $X \in \mathbb{R}^{n \times d}$ satisfying a simple stable rank condition, and we show that a matrix drawn from the distribution $X \Phi^\top \Phi$, where $\Phi \in \mathbb{R}^{m \times d}$ is a subgaussian random matrix, with high probability, satisfies the RE condition. This construction allows incorporating a fixed matrix that has an easily {\em verifiable} condition into the design process, and allows for generation of {\em compressed} design matrices that have a lower storage requirement than a standard design matrix. We give two applications of this construction to sparse linear regression problems, including one to a compressed sparse regression setting where the regression algorithm only has access to a compressed representation of a fixed design matrix $X$.

artificial intelligence, machine learning, matrix, (15 more...)

arXiv.org Machine Learning

1707.08092

Country: North America > United States > Michigan (0.28)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

Logistic Regression: A Concise Technical Overview

@machinelearnbotFeb-16-2018, 21:21:56 GMT

A popular statistical technique to predict binomial outcomes (y 0 or 1) is Logistic Regression. Logistic regression predicts categorical outcomes (binomial / multinomial values of y), whereas linear Regression is good for predicting continuous-valued outcomes (such as weight of a person in kg, the amount of rainfall in cm). The predictions of Logistic Regression (henceforth, LogR in this article) are in the form of probabilities of an event occurring, ie the probability of y 1, given certain values of input variables x. As shown in Figure1, the logit function on the right- with a range of - to, is the inverse of the logistic function shown on the left- with a range of 0 to 1. Estimating the values of B0,B1,..,Bk involves the concepts of probability, odds and log odds. The example dataset here is sourced from the UCLA website. The task is to predict which students graduated with honours or not (y 1 or 0), for 200 students with fields female, read, write, math, hon, femalexmath .

artificial intelligence, machine learning, probability, (16 more...)

@machinelearnbot

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

Generalized Linear Models – Towards Data Science

#artificialintelligenceFeb-15-2018, 09:29:55 GMT

It has been long time since I wrote the first machine learning for everyone article. From now on, I will try to publish articles more frequently. Quick Note: Unfortunately, Medium does not support mathematical type setting (Latex etc.), so I put mathematical formulas as images to articles and I have no idea, if equations look elegant in different devices. Today's topic is Generalized Linear Models, a bunch of general machine learning models for supervised learning problems(both for regression and classification). Let's start with linear regression models. I think, everyone has encountered linear regression models during his/her university years, in one way or another.

artificial intelligence, loss function, machine learning, (17 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

Tree Ensembles with Rule Structured Horseshoe Regularization

Nalenz, Malte, Villani, Mattias

arXiv.org Machine LearningFeb-15-2018

We propose a new Bayesian model for flexible nonlinear regression and classification using tree ensembles. The model is based on the RuleFit approach in Friedman and Popescu (2008) where rules from decision trees and linear terms are used in a L1-regularized regression. We modify RuleFit by replacing the L1-regularization by a horseshoe prior, which is well known to give aggressive shrinkage of noise predictor while leaving the important signal essentially untouched. This is especially important when a large number of rules are used as predictors as many of them only contribute noise. Our horseshoe prior has an additional hierarchical layer that applies more shrinkage a priori to rules with a large number of splits, and to rules that are only satisfied by a few observations. The aggressive noise shrinkage of our prior also makes it possible to complement the rules from boosting in Friedman and Popescu (2008) with an additional set of trees from random forest, which brings a desirable diversity to the ensemble. We sample from the posterior distribution using a very efficient and easily implemented Gibbs sampler. The new model is shown to outperform state-of-the-art methods like RuleFit, BART and random forest on 16 datasets. The model and its interpretation is demonstrated on the well known Boston housing data, and on gene expression data for cancer classification. The posterior sampling, prediction and graphical tools for interpreting the model results are implemented in a publicly available R package.

artificial intelligence, horserule, machine learning, (19 more...)

arXiv.org Machine Learning

1702.05008

Country: Europe (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.89)
(2 more...)

Add feedback