AITopics | Regression

Neural networks are usually not the tool of choice for nonparametric high-dimensional problems where the number of input features is much larger than the number of observations. Though neural networks can approximate complex multivariate functions, they generally require a large number of training observations to obtain reasonable fits, unless one can learn the appropriate network structure. In this manuscript, we show that neural networks can be applied successfully to high-dimensional settings if the true function falls in a low dimensional subspace, and proper regularization is used. We propose fitting a neural network with a sparse group lasso penalty on the first-layer input weights, which results in a neural net that only uses a small subset of the original features. In addition, we characterize the statistical convergence of the penalized empirical risk minimizer to the optimal neural network: we show that the excess risk of this penalized estimator only grows with the logarithm of the number of input features; and we show that the weights of irrelevant features converge to zero. Via simulation studies and data analyses, we show that these sparse-input neural networks outperform existing nonparametric high-dimensional estimation methods when the data has complex higher-order interactions.

artificial intelligence, machine learning, neural network, (18 more...)

arXiv.org Machine Learning

1711.07592

Country: North America (0.28)

Genre: Research Report > New Finding (0.93)

Industry:

Health & Medicine > Therapeutic Area > Oncology > Leukemia (0.46)
Health & Medicine > Therapeutic Area > Hematology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Finding Differentially Covarying Needles in a Temporally Evolving Haystack: A Scan Statistics Perspective

Mehta, Ronak, Kim, Hyunwoo J., Wang, Shulei, Johnson, Sterling C., Yuan, Ming, Singh, Vikas

arXiv.org Machine LearningNov-20-2017

Recent results in coupled or temporal graphical models offer schemes for estimating the relationship structure between features when the data come from related (but distinct) longitudinal sources. A novel application of these ideas is for analyzing group-level differences, i.e., in identifying if trends of estimated objects (e.g., covariance or precision matrices) are different across disparate conditions (e.g., gender or disease). Often, poor effect sizes make detecting the differential signal over the full set of features difficult: for example, dependencies between only a subset of features may manifest differently across groups. In this work, we first give a parametric model for estimating trends in the space of SPD matrices as a function of one or more covariates. We then generalize scan statistics to graph structures, to search over distinct subsets of features (graph partitions) whose temporal dependency structure may show statistically significant group-wise differences. We theoretically analyze the Family Wise Error Rate (FWER) and bounds on Type 1 and Type 2 error. On a cohort of individuals with risk factors for Alzheimer's disease (but otherwise cognitively healthy), we find scientifically interesting group differences where the default analysis, i.e., models estimated on the full graph, do not survive reasonable significance thresholds.

artificial intelligence, machine learning, manifold, (17 more...)

arXiv.org Machine Learning

1711.07575

Country: North America > United States > Wisconsin (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (1.00)
Health & Medicine > Consumer Health (1.00)
Government (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.66)

Add feedback

Statistical inference using SGD

Li, Tianyang, Liu, Liu, Kyrillidis, Anastasios, Caramanis, Constantine

arXiv.org Machine LearningNov-19-2017

We present a novel method for frequentist statistical inference in $M$-estimation problems, based on stochastic gradient descent (SGD) with a fixed step size: we demonstrate that the average of such SGD sequences can be used for statistical inference, after proper scaling. An intuitive analysis using the Ornstein-Uhlenbeck process suggests that such averages are asymptotically normal. From a practical perspective, our SGD-based inference procedure is a first order method, and is well-suited for large scale problems. To show its merits, we apply it to both synthetic and real datasets, and demonstrate that its accuracy is comparable to classical statistical methods, while requiring potentially far less computation.

artificial intelligence, bayesian inference, machine learning, (14 more...)

arXiv.org Machine Learning

1705.07477

Genre: Research Report > New Finding (0.95)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

Machine Learning Algorithms: Which One to Choose for Your Problem

@machinelearnbotNov-18-2017, 19:00:13 GMT

When I was beginning my way in data science, I often faced the problem of choosing the most appropriate algorithm for my specific problem. If you're like me, when you open some article about machine learning algorithms, you see dozens of detailed descriptions. The paradox is that they don't ease the choice. In this article for Statsbot, I will try to explain basic concepts and give some intuition of using different kinds of machine learning algorithms in different tasks. At the end of the article, you'll find the structured overview of the main features of described algorithms. Supervised learning Supervised learning is the task of inferring a function from labeled training data.

algorithm, artificial intelligence, machine learning, (16 more...)

@machinelearnbot

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.33)

Add feedback

The 10 Statistical Techniques Data Scientists Need to Master

@machinelearnbotNov-17-2017, 01:26:18 GMT

Regardless of where you stand on the matter of Data Science sexiness, it's simply impossible to ignore the continuing importance of data, and our ability to analyze, organize, and contextualize it. Drawing on their vast stores of employment data and employee feedback, Glassdoor ranked Data Scientist #1 in their 25 Best Jobs in America list. So the role is here to stay, but unquestionably, the specifics of what a Data Scientist does will evolve. With technologies like Machine Learning becoming ever-more common place, and emerging fields like Deep Learning gaining significant traction amongst researchers and engineers -- and the companies that hire them -- Data Scientists continue to ride the crest of an incredible wave of innovation and technological progress. While having a strong coding ability is important, data science isn't all about software engineering (in fact, have a good familiarity with Python and you're good to go).

artificial intelligence, machine learning, scientist, (9 more...)

@machinelearnbot

Genre: Research Report (0.42)

Industry: Education (0.49)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.38)

Add feedback

Household poverty classification in data-scarce environments: a machine learning approach

Kshirsagar, Varun, Wieczorek, Jerzy, Ramanathan, Sharada, Wells, Rachel

arXiv.org Machine LearningNov-17-2017

We describe a method to identify poor households in data-scarce countries by leveraging information contained in nationally representative household surveys. It employs standard statistical learning techniques---cross-validation and parameter regularization---which together reduce the extent to which the model is over-fitted to match the idiosyncracies of observed survey data. The automated framework satisfies three important constraints of this development setting: i) The prediction model uses at most ten questions, which limits the costs of data collection; ii) No computation beyond simple arithmetic is needed to calculate the probability that a given household is poor, immediately after data on the ten indicators is collected; and iii) One specification of the model (i.e. one scorecard) is used to predict poverty throughout a country that may be characterized by significant sub-national differences. Using survey data from Zambia, the model's out-of-sample predictions distinguish poor households from non-poor households using information contained in ten questions.

artificial intelligence, household, machine learning, (15 more...)

arXiv.org Machine Learning

1711.06813

Country:

North America > United States (0.47)
Africa > Zambia (0.35)

Genre: Research Report (0.89)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.72)

Add feedback

Regularization in Machine Learning – Towards Data Science

#artificialintelligenceNov-16-2017, 03:35:54 GMT

One of the major aspects of training your machine learning model is avoiding overfitting. The model will have a low accuracy if it is overfitting. This happens because your model is trying too hard to capture the noise in your training dataset. By noise we mean the data points that don't really represent the true properties of your data, but random chance. Learning such data points, makes your model more flexible, at the risk of overfitting. The concept of balancing bias and variance, is helpful in understanding the phenomenon of overfitting.

artificial intelligence, coefficient, machine learning, (18 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Add feedback

The 10 Statistical Techniques Data Scientists Need to Master

@machinelearnbotNov-15-2017, 15:56:50 GMT

Regardless of where you stand on the matter of Data Science sexiness, it's simply impossible to ignore the continuing importance of data, and our ability to analyze, organize, and contextualize it. Drawing on their vast stores of employment data and employee feedback, Glassdoor ranked Data Scientist #1 in their 25 Best Jobs in America list. So the role is here to stay, but unquestionably, the specifics of what a Data Scientist does will evolve. With technologies like Machine Learning becoming ever-more common place, and emerging fields like Deep Learning gaining significant traction amongst researchers and engineers -- and the companies that hire them -- Data Scientists continue to ride the crest of an incredible wave of innovation and technological progress. While having a strong coding ability is important, data science isn't all about software engineering (in fact, have a good familiarity with Python and you're good to go).

artificial intelligence, independent variable, machine learning, (11 more...)

@machinelearnbot

Genre: Research Report (0.39)

Industry: Education (0.48)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.76)

Add feedback

Accelerating Cross-Validation in Multinomial Logistic Regression with $\ell_1$-Regularization

Obuchi, Tomoyuki, Kabashima, Yoshiyuki

arXiv.org Machine LearningNov-15-2017

We develop an approximate formula for evaluating a cross-validation estimator of predictive likelihood for multinomial logistic regression regularized by an $\ell_1$-norm. This allows us to avoid repeated optimizations required for literally conducting cross-validation; hence, the computational time can be significantly reduced. The formula is derived through a perturbative approach employing the largeness of the data size and the model dimensionality. Its usefulness is demonstrated on simulated data and the ISOLET dataset from the UCI machine learning repository.

approximation, artificial intelligence, machine learning, (15 more...)

arXiv.org Machine Learning

1711.0542

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)

Genre:

Research Report > New Finding (0.72)
Research Report > Experimental Study (0.62)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.81)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)

Add feedback

Linear Regression with Sparsely Permuted Data

Slawski, Martin, Ben-David, Emanuel

arXiv.org Machine LearningNov-15-2017

In regression analysis of multivariate data, it is tacitly assumed that response and predictor variables in each observed response-predictor pair correspond to the same entity or unit. In this paper, we consider the situation of "permuted data" in which this basic correspondence has been lost. Several recent papers have considered this situation without further assumptions on the underlying permutation. In applications, the latter is often to known to have additional structure that can be leveraged. Specifically, we herein consider the common scenario of "sparsely permuted data" in which only a small fraction of the data is affected by a mismatch between response and predictors. However, an adverse effect already observed for sparsely permuted data is that the least squares estimator as well as other estimators not accounting for such partial mismatch are inconsistent. One approach studied in detail herein is to treat permuted data as outliers which motivates the use of robust regression formulations to estimate the regression parameter. The resulting estimate can subsequently be used to recover the permutation. A notable benefit of the proposed approach is its computational simplicity given the general lack of procedures for the above problem that are both statistically sound and computationally appealing.

artificial intelligence, estimator, machine learning, (17 more...)

arXiv.org Machine Learning

1710.0603

Country: North America > United States (0.93)

Genre: Research Report > New Finding (0.88)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

Filters

Collaborating Authors

Regression

Sparse-Input Neural Networks for High-dimensional Nonparametric Regression and Classification

Finding Differentially Covarying Needles in a Temporally Evolving Haystack: A Scan Statistics Perspective

Statistical inference using SGD

Machine Learning Algorithms: Which One to Choose for Your Problem

The 10 Statistical Techniques Data Scientists Need to Master

Household poverty classification in data-scarce environments: a machine learning approach

Regularization in Machine Learning – Towards Data Science

The 10 Statistical Techniques Data Scientists Need to Master

Accelerating Cross-Validation in Multinomial Logistic Regression with $\ell_1$-Regularization

Linear Regression with Sparsely Permuted Data