AITopics | Regression

Collaborating Authors

Regression

News Overviews Instructional Materials AI-Alerts Classics

A Farewell to the Bias-Variance Tradeoff? An Overview of the Theory of Overparameterized Machine Learning

Dar, Yehuda, Muthukumar, Vidya, Baraniuk, Richard G.

arXiv.org Machine LearningSep-6-2021

The rapid recent progress in machine learning (ML) has raised a number of scientific questions that challenge the longstanding dogma of the field. One of the most important riddles is the good empirical generalization of overparameterized models. Overparameterized models are excessively complex with respect to the size of the training dataset, which results in them perfectly fitting (i.e., interpolating) the training data, which is usually noisy. Such interpolation of noisy data is traditionally associated with detrimental overfitting, and yet a wide range of interpolating models -- from simple linear models to deep neural networks -- have recently been observed to generalize extremely well on fresh test data. Indeed, the recently discovered double descent phenomenon has revealed that highly overparameterized models often improve over the best underparameterized model in test performance. Understanding learning in this overparameterized regime requires new theory and foundational empirical studies, even for the simplest case of the linear model. The underpinnings of this understanding have been laid in very recent analyses of overparameterized linear regression and related statistical learning tasks, which resulted in precise analytic characterizations of double descent. This paper provides a succinct overview of this emerging theory of overparameterized ML (henceforth abbreviated as TOPML) that explains these recent findings through a statistical signal processing perspective. We emphasize the unique aspects that define the TOPML research area as a subfield of modern ML theory and outline interesting open questions that remain.

interpolation, overparameterization, regression, (13 more...)

arXiv.org Machine Learning

2109.02355

Country:

Asia > Middle East > Jordan (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Top 9 types of machine learning algorithms, with cheat sheet

#artificialintelligenceSep-5-2021, 08:55:28 GMT

Supervised learning models require data scientists to provide the algorithm with data sets for input and parameters for output, as well as feedback on accuracy during the training process. They are task-based, and test on labeled data sets. The most popular type of machine learning algorithm is arguably linear regression. Linear regression algorithms map simple correlations between two variables in a set of data. A set of inputs and their corresponding outputs are examined and quantified to show a relationship, including how a change in one variable affects the other.

algorithm, data scientist, linear regression, (10 more...)

#artificialintelligence

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.06)
North America > United States > Arizona > Maricopa County > Chandler (0.06)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.87)

Add feedback

Optimal transport weights for causal inference

Dunipace, Eric

arXiv.org Machine LearningSep-5-2021

Weighting methods are a common tool to de-bias estimates of causal effects. And though there are an increasing number of seemingly disparate methods, many of them can be folded into one unifying regime: causal optimal transport. This new method directly targets distributional balance by minimizing optimal transport distances between treatment and control groups or, more generally, between a source and target population. Our approach is model-free but can also incorporate moments or any other important functions of covariates that the researcher desires to balance. We find that the causal optimal transport outperforms competitor methods when both the propensity score and outcome models are misspecified, indicating it is a robust alternative to common weighting methods. Finally, we demonstrate the utility of our method in an external control study examining the effect of misoprostol versus oxytocin for treatment of post-partum hemorrhage.

basis function, constraint, estimator, (15 more...)

arXiv.org Machine Learning

2109.01991

Country:

Europe > Austria > Vienna (0.14)
Africa > Middle East > Egypt (0.04)
North America > United States > New York > New York County > New York City (0.04)
(8 more...)

Genre: Research Report > Experimental Study (1.00)

Industry:

Government > Regional Government > North America Government > United States Government (0.93)
Health & Medicine > Pharmaceuticals & Biotechnology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Top 10 Machine Learning Algorithms You Should Know in 2021

#artificialintelligenceSep-3-2021, 09:20:16 GMT

Nowadays businesses are focusing on automation. They are trying to automate all manual tasks that consume a lot of human effort and time. Today machine learning algorithms have taken over the process that was considered to be mundane or dangerous. Technology is continuously churning businesses making them efficient, smarter, and capable. As technology has become accessible, new innovations in business processes have emerged. The technology revolution was triggered by the democratization of computing tools and techniques which are now easily available.

Add feedback

Relating the Partial Dependence Plot and Permutation Feature Importance to the Data Generating Process

Molnar, Christoph, Freiesleben, Timo, König, Gunnar, Casalicchio, Giuseppe, Wright, Marvin N., Bischl, Bernd

arXiv.org Machine LearningSep-3-2021

Scientists and practitioners increasingly rely on machine learning to model data and draw conclusions. Compared to statistical modeling approaches, machine learning makes fewer explicit assumptions about data structures, such as linearity. However, their model parameters usually cannot be easily related to the data generating process. To learn about the modeled relationships, partial dependence (PD) plots and permutation feature importance (PFI) are often used as interpretation methods. However, PD and PFI lack a theory that relates them to the data generating process. We formalize PD and PFI as statistical estimators of ground truth estimands rooted in the data generating process. We show that PD and PFI estimates deviate from this ground truth due to statistical biases, model variance and Monte Carlo approximation errors. To account for model variance in PD and PFI estimation, we propose the learner-PD and the learner-PFI based on model refits, and propose corrected variance and confidence interval estimators.

confidence interval, pfi, variance, (13 more...)

arXiv.org Machine Learning

2109.01433

Country:

Europe > Germany > Bremen > Bremen (0.14)
Europe > Austria > Vienna (0.14)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)

Add feedback

10 Top Types of Data Analysis Methods and Techniques

#artificialintelligenceAug-31-2021, 16:11:56 GMT

Here we will see a list of the most known classic and modern types of Data Analysis methods and models. Mathematical and Statistical Methods for Data Analysis Mathematical and statistical sciences have much to give to data mining management and analysis. In fact, most data mining techniques are statistical data analysis tools. Some methods and techniques are well known and very effective. This statistical technique does exactly what the name suggests -"Describe".

data analysis method and technique, data mining, regression, (8 more...)

#artificialintelligence

Genre: Research Report (0.39)

Technology:

Information Technology > Data Science > Data Mining (0.83)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.78)

Add feedback

Feature engineering A-Z

#artificialintelligenceAug-31-2021, 14:18:36 GMT

Let's say we have the data on consumption statistics of some kind and it has a time stamp on it: In this example, the "Date" column could easily be used to extract additional features and generate powerful insights such as variations of consumption on weekdays or weekends or at a particular time in the year (see yellow highlights below). Feature synthesis is the opposite of feature extraction. In this case, one or more features are combined into creating new features that are more informative than they are individually. Let's say, in a house price dataset you have two columns: floor_space (sqft) and total_house_price (US$). You could use them individually in your analysis but you could also create a new calculated feature called price_per_sqft (US$/sqft). Feature scaling/transformation refers to a variety of methods applied in data preprocessing to rescale or normalize data into a different range.

dataset, normalization, transformation, (4 more...)

#artificialintelligence

Technology:

Information Technology > Data Science (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.32)

Add feedback

Scalable Spatiotemporally Varying Coefficient Modeling with Bayesian Kernelized Tensor Regression

Lei, Mengying, Labbe, Aurelie, Sun, Lijun

arXiv.org Machine LearningAug-31-2021

As a regression technique in spatial statistics, spatiotemporally varying coefficient model (STVC) is an important tool to discover nonstationary and interpretable response-covariate associations over both space and time. However, it is difficult to apply STVC for large-scale spatiotemporal analysis due to the high computational cost. To address this challenge, we summarize the spatiotemporally varying coefficients using a third-order tensor structure and propose to reformulate the spatiotemporally varying coefficient model as a special low-rank tensor regression problem. The low-rank decomposition can effectively model the global patterns of the large data with substantially reduced number of parameters. To further incorporate the local spatiotemporal dependencies among the samples, we place Gaussian process (GP) priors on the spatial and temporal factor matrices to better encode local spatial and temporal processes on each factor component. We refer to the overall framework as Bayesian Kernelized Tensor Regression (BKTR). For model inference, we develop an efficient Markov chain Monte Carlo (MCMC) algorithm, which uses Gibbs sampling to update factor matrices and slice sampling to update kernel hyperparameters. We conduct extensive experiments on both synthetic and real-world data sets, and our results confirm the superior performance and efficiency of BKTR for model estimation and parameter inference.

coefficient, matrix, regression, (10 more...)

arXiv.org Machine Learning

2109.00046

Country:

North America > Canada > Quebec > Montreal (0.14)
Africa > Senegal > Kolda Region > Kolda (0.05)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

Add feedback

Uniform Consistency in Nonparametric Mixture Models

Aragam, Bryon, Yang, Ruiyi

arXiv.org Machine LearningAug-31-2021

We study uniform consistency in nonparametric mixture models as well as closely related mixture of regression (also known as mixed regression) models, where the regression functions are allowed to be nonparametric and the error distributions are assumed to be convolutions of a Gaussian density. We construct uniformly consistent estimators under general conditions while simultaneously highlighting several pain points in extending existing pointwise consistency results to uniform results. The resulting analysis turns out to be nontrivial, and several novel technical tools are developed along the way. In the case of mixed regression, we prove $L^1$ convergence of the regression functions while allowing for the component regression functions to intersect arbitrarily often, which presents additional technical challenges. We also consider generalizations to general (i.e. non-convolutional) nonparametric mixtures.

assumption, estimator, statistics, (17 more...)

arXiv.org Machine Learning

2108.14003

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)

Genre: Research Report > New Finding (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.96)

Add feedback

Learning Optimal Prescriptive Trees from Observational Data

Jo, Nathanael, Aghaei, Sina, Gómez, Andrés, Vayanos, Phebe

arXiv.org Machine LearningAug-31-2021

We consider the problem of learning an optimal prescriptive tree (i.e., a personalized treatment assignment policy in the form of a binary tree) of moderate depth, from observational data. This problem arises in numerous socially important domains such as public health and personalized medicine, where interpretable and data-driven interventions are sought based on data gathered in deployment, through passive collection of data, rather than from randomized trials. We propose a method for learning optimal prescriptive trees using mixed-integer optimization (MIO) technology. We show that under mild conditions our method is asymptotically exact in the sense that it converges to an optimal out-of-sample treatment assignment policy as the number of historical data samples tends to infinity. This sets us apart from existing literature on the topic which either requires data to be randomized or imposes stringent assumptions on the trees. Based on extensive computational experiments on both synthetic and real data, we demonstrate that our asymptotic guarantees translate to significant out-of-sample performance improvements even in finite samples.

formulation, latexit sha1, learning optimal prescriptive tree, (12 more...)

arXiv.org Machine Learning

2108.13628

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Japan > Kyūshū & Okinawa > Kyūshū > Fukuoka Prefecture > Fukuoka (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Endocrinology (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback