AITopics | Regression

Collaborating Authors

Regression

News Overviews Instructional Materials AI-Alerts Classics

Benign Overfitting in Linear Regression

Bartlett, Peter L., Long, Philip M., Lugosi, Gábor, Tsigler, Alexander

arXiv.org Machine LearningJun-26-2019

The phenomenon of benign overfitting is one of the key mysteries uncovered by deep learning methodology: deep neural networks seem to predict well, even with a perfect fit to noisy training data. Motivated by this phenomenon, we consider when a perfect fit to training data in linear regression is compatible with accurate prediction. We give a characterization of gaussian linear regression problems for which the minimum norm interpolating prediction rule has near-optimal prediction accuracy. The characterization is in terms of two notions of the effective rank of the data covariance. It shows that overparameterization is essential for benign overfitting in this setting: the number of directions in parameter space that are unimportant for prediction must significantly exceed the sample size.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Machine Learning

1906.113

Country: Europe (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.81)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

A comparison of apartment rent price prediction using a large dataset: Kriging versus DNN

Seya, Hajime, Shiroi, Daiki

arXiv.org Machine LearningJun-25-2019

The hedonic approach based on a regression model has been widely adopted for the prediction of real estate property price and rent. In particular, a spatial regression technique called Kriging, a method of interpolation that was advanced in the field of spatial statistics, are known to enable high accuracy prediction in light of the spatial dependence of real estate property data. Meanwhile, there has been a rapid increase in machine learning-based prediction using a large (big) dataset and its effectiveness has been demonstrated in previous studies. However, no studies have ever shown the extent to which predictive accuracy differs for Kriging and machine learning techniques using big data. Thus, this study compares the predictive accuracy of apartment rent price in Japan between the nearest neighbor Gaussian processes (NNGP) model, which enables application of Kriging to big data, and the deep neural network (DNN), a representative machine learning technique, with a particular focus on the data sample size (n = 10^4, 10^5, 10^6) and differences in predictive performance. Our analysis showed that, with an increase in sample size, the out-of-sample predictive accuracy of DNN approached that of NNGP and they were nearly equal on the order of n = 10^6. Furthermore, it is suggested that, for both higher and lower end properties whose rent price deviates from the median, DNN may have a higher predictive accuracy than that of NNGP.

artificial intelligence, machine learning, sample size, (16 more...)

arXiv.org Machine Learning

1906.11099

Country:

Asia > Japan (0.49)
North America > United States (0.46)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.95)

Industry: Banking & Finance > Real Estate (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.70)

Add feedback

AMF: Aggregated Mondrian Forests for Online Learning

Mourtada, Jaouad, Gaïffas, Stéphane, Scornet, Erwan

arXiv.org Machine LearningJun-25-2019

Introduced by Breiman (2001), Random Forests (RF) is one of the algorithms of choice in many supervised learning applications. The appeal of these methods comes from their remarkable accuracy in a variety of tasks, the small number (or even the absence) of parameters to tune, their reasonable computational cost at training and prediction time, and their suitability in highdimensional settings. Most commonly used RF algorithms, such as the original random forest procedure (Breiman, 2001), extra-trees (Geurts et al., 2006), or conditional inference forest (Hothorn et al., 2010) are batch algorithms, that require the whole dataset to be available at once. Several online random forests variants have been proposed to overcome this issue and handle data that come sequentially. Utgoff (1989) was the first to extend Quinlan's ID3 batch decision tree algorithm (see Quinlan, 1986) to an online setting. Later on, Domingos and Hulten (2000) introduce Hoeffding Trees that can be easily updated: since observations are available sequentially, a cell is split when (i) enough observations have fallen into this cell, (ii) the best split in the cell is statistically relevant (a generic Hoeffding inequality being used to assess the quality of the best split). Since random forests are known to exhibit better empirical performances than individual decision trees, online random forests have been proposed (see, e.g., Saffari et al., 2009; Denil et al., 2013).

algorithm, artificial intelligence, machine learning, (20 more...)

arXiv.org Machine Learning

1906.10529

Country:

Europe (0.28)
North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Education > Educational Setting > Online (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

A Review of Statistical Learning Machines from ATR to DNA Microarrays: design, assessment, and advice for practitioners

Yousef, Waleed A.

arXiv.org Machine LearningJun-25-2019

Statistical Learning is the process of estimating an unknown probabilistic input-output relationship of a system using a limited number of observations; and a statistical learning machine (SLM) is the machine that learned such a process. While their roots grow deeply in Probability Theory, SLMs are ubiquitous in the modern world. Automatic Target Recognition (ATR) in military applications, Computer Aided Diagnosis (CAD) in medical imaging, DNA microarrays in Genomics, Optical Character Recognition (OCR), Speech Recognition (SR), spam email filtering, stock market prediction, etc., are few examples and applications for SLM; diverse fields but one theory. The field of Statistical Learning can be decomposed to two basic subfields, Design and Assessment. Three main groups of specializations-namely statisticians, engineers, and computer scientists (ordered ascendingly by programming capabilities and descendingly by mathematical rigor)-exist on the venue of this field and each takes its elephant bite. Exaggerated rigorous analysis of statisticians sometimes deprives them from considering new ML techniques and methods that, yet, have no "complete" mathematical theory. On the other hand, immoderate add-hoc simulations of computer scientists sometimes derive them towards unjustified and immature results. A prudent approach is needed that has the enough flexibility to utilize simulations and trials and errors without sacrificing any rigor. If this prudent attitude is necessary for this field it is necessary, as well, in other fields of Engineering.

artificial intelligence, classification, machine learning, (19 more...)

arXiv.org Machine Learning

1906.10019

Country:

North America > United States (0.47)
Europe > United Kingdom > England (0.28)

Genre: Research Report (0.83)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.69)

Add feedback

Policy Targeting under Network Interference

Viviano, Davide

arXiv.org Machine LearningJun-24-2019

The empirical analysis of experiments and quasi-experiments often seeks to determine the optimal allocation of treatments that maximizes social welfare. In the presence of interference, spillover effects lead to a new formulation of the statistical treatment choice problem. This paper develops a novel method to construct individual-specific optimal treatment allocation rules under network interference. Several features make the proposed methodology particularly appealing for applications: we construct targeting rules that depend on an arbitrary set of individual, neighbors' and network characteristics, and we allow for general constraints on the policy function; we consider heterogeneous direct and spillover effects, arbitrary, possibly non-linear, regression models, and we propose estimators that are robust to model misspecification; the method flexibly accommodates for cases where researchers only observe local information of the network. From a theoretical perspective, we establish the first set of guarantees on the utilitarian regret under interference, and we show that it achieves the min-max optimal rate in scenarios of practical and theoretical interest. We discuss the empirical performance in simulations and we illustrate our method by investigating the role of social networks in micro-finance decisions.

artificial intelligence, machine learning, neighbor, (16 more...)

arXiv.org Machine Learning

1906.10258

Country:

North America > United States (0.45)
Asia > India (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Research Report > Strength High (0.68)

Industry:

Health & Medicine (1.00)
Government (0.92)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

The NN-Stacking: Feature weighted linear stacking through neural networks

Coscrato, Victor, Inácio, Marco Henrique de Almeida, Izbicki, Rafael

arXiv.org Machine LearningJun-24-2019

Stacking methods improve the prediction performance of regression models. A simple way to stack base regressions estimators is by combining them linearly, as done by \citet{breiman1996stacked}. Even though this approach is useful from an interpretative perspective, it often does not lead to high predictive power. We propose the NN-Stacking method (NNS), which generalizes Breiman's method by allowing the linear parameters to vary with input features. This improvement enables NNS to take advantage of the fact that distinct base models often perform better at different regions of the feature space. Our method uses neural networks to estimate the stacking coefficients. We show that while our approach keeps the interpretative features of Breiman's method at a local level, it leads to better predictive power, especially in datasets with large sample sizes.

artificial intelligence, estimator, machine learning, (15 more...)

arXiv.org Machine Learning

1906.09735

Country: Europe > Hungary (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)

Add feedback

The Value of Collaboration in Convex Machine Learning with Differential Privacy

Wu, Nan, Farokhi, Farhad, Smith, David, Kaafar, Mohamed Ali

arXiv.org Machine LearningJun-23-2019

In this paper, we apply machine learning to distributed private data owned by multiple data owners, entities with access to non-overlapping training datasets. We use noisy, differentially-private gradients to minimize the fitness cost of the machine learning model using stochastic gradient descent. We quantify the quality of the trained model, using the fitness cost, as a function of privacy budget and size of the distributed datasets to capture the trade-off between privacy and utility in machine learning. This way, we can predict the outcome of collaboration among privacy-aware data owners prior to executing potentially computationally-expensive machine learning algorithms. Particularly, we show that the difference between the fitness of the trained machine learning model using differentially-private gradient queries and the fitness of the trained machine model in the absence of any privacy concerns is inversely proportional to the size of the training datasets squared and the privacy budget squared. We successfully validate the performance prediction with the actual performance of the proposed privacy-aware learning algorithms, applied to: financial datasets for determining interest rates of loans using regression; and detecting credit card frauds using support vector machines.

artificial intelligence, dataset, machine learning, (15 more...)

arXiv.org Machine Learning

1906.09679

Country:

Oceania > Australia (0.14)
North America > United States > California (0.04)

Genre: Research Report (0.64)

Industry:

Information Technology > Security & Privacy (1.00)
Banking & Finance (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)

Add feedback

Asymmetric Random Projections

Ryder, Nick, Karnin, Zohar, Liberty, Edo

arXiv.org Machine LearningJun-22-2019

Random projections (RP) are a popular tool for reducing dimensionality while preserving local geometry. In many applications the data set to be projected is given to us in advance, yet the current RP techniques do not make use of information about the data. In this paper, we provide a computationally light way to extract statistics from the data that allows designing a data dependent RP with superior performance compared to data-oblivious RP. We tackle scenarios such as matrix multiplication and linear regression/classification in which we wish to estimate inner products between pairs of vectors from two possibly different sources. Our technique takes advantage of the difference between the sources and is provably superior to oblivious RPs. Additionally, we provide extensive experiments comparing RPs with our approach showing significant performance lifts in fast matrix multiplication, regression and classification problems.

artificial intelligence, machine learning, projection, (16 more...)

arXiv.org Machine Learning

1906.09489

Country: North America > United States > California (0.93)

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.36)

Add feedback

Distilling BERT -- How to achieve BERT performance using logistic regression

#artificialintelligenceJun-21-2019, 18:19:15 GMT

BERT is awesome, and it's everywhere. It looks like any NLP task can benefit from utilizing BERT. The authors showed that this is indeed the case, and from my experience, it works like magic. It's easy to use, works on a small amount of data and supports many different languages. It seems like there's no single reason not to use it everywhere.

artificial intelligence, machine learning, prediction, (7 more...)

#artificialintelligence

Genre:

Research Report > New Finding (0.52)
Research Report > Experimental Study (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.40)

Add feedback

Max-Affine Regression: Provable, Tractable, and Near-Optimal Statistical Estimation

Ghosh, Avishek, Pananjady, Ashwin, Guntuboyina, Adityanand, Ramchandran, Kannan

arXiv.org Machine LearningJun-21-2019

Max-affine regression refers to a model where the unknown regression function is modeled as a maximum of $k$ unknown affine functions for a fixed $k \geq 1$. This generalizes linear regression and (real) phase retrieval, and is closely related to convex regression. Working within a non-asymptotic framework, we study this problem in the high-dimensional setting assuming that $k$ is a fixed constant, and focus on estimation of the unknown coefficients of the affine functions underlying the model. We analyze a natural alternating minimization (AM) algorithm for the non-convex least squares objective when the design is random. We show that the AM algorithm, when initialized suitably, converges with high probability and at a geometric rate to a small ball around the optimal coefficients. In order to initialize the algorithm, we propose and analyze a combination of a spectral method and a random search scheme in a low-dimensional space, which may be of independent interest. The final rate that we obtain is near-parametric and minimax optimal (up to a poly-logarithmic factor) as a function of the dimension, sample size, and noise variance. In that sense, our approach should be viewed as a direct and implementable method of enforcing regularization to alleviate the curse of dimensionality in problems of the convex regression type. As a by-product of our analysis, we also obtain guarantees on a classical algorithm for the phase retrieval problem under considerably weaker assumptions on the design distribution than was previously known. Numerical experiments illustrate the sharpness of our bounds in the various problem parameters.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

1906.09255

Country:

North America > United States (0.27)
North America > Canada (0.27)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)

Add feedback