AITopics | Regression

Collaborating Authors

Regression

News Overviews Instructional Materials AI-Alerts Classics

Parameters or Privacy: A Provable Tradeoff Between Overparameterization and Membership Inference

Tan, Jasper, Mason, Blake, Javadi, Hamid, Baraniuk, Richard G.

arXiv.org Machine LearningFeb-2-2022

A surprising phenomenon in modern machine learning is the ability of a highly overparameterized model to generalize well (small error on the test data) even when it is trained to memorize the training data (zero error on the training data). This has led to an arms race towards increasingly overparameterized models (c.f., deep learning). In this paper, we study an underexplored hidden cost of overparameterization: the fact that overparameterized models are more vulnerable to privacy attacks, in particular the membership inference attack that predicts the (potentially sensitive) examples used to train a model. We significantly extend the relatively few empirical results on this problem by theoretically proving for an overparameterized linear regression model with Gaussian data that the membership inference vulnerability increases with the number of parameters. Moreover, a range of empirical studies indicates that more complex, nonlinear models exhibit the same behavior. Finally, we study different methods for mitigating such attacks in the overparameterized regime, such as noise addition and regularization, and conclude that simply reducing the parameters of an overparameterized model is an effective strategy to protect it from membership inference without greatly decreasing its generalization error.

membership advantage, membership inference, regression, (14 more...)

arXiv.org Machine Learning

2202.01243

Country: North America > United States > California > San Diego County > San Diego (0.04)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.55)

Add feedback

VC-PCR: A Prediction Method based on Supervised Variable Selection and Clustering

Marion, Rebecca, Lederer, Johannes, Govaerts, Bernadette, von Sachs, Rainer

arXiv.org Machine LearningFeb-2-2022

Sparse linear prediction methods suffer from decreased prediction accuracy when the predictor variables have cluster structure (e.g. there are highly correlated groups of variables). To improve prediction accuracy, various methods have been proposed to identify variable clusters from the data and integrate cluster information into a sparse modeling process. But none of these methods achieve satisfactory performance for prediction, variable selection and variable clustering simultaneously. This paper presents Variable Cluster Principal Component Regression (VC-PCR), a prediction method that supervises variable selection and variable clustering in order to solve this problem. Experiments with real and simulated data demonstrate that, compared to competitor methods, VC-PCR achieves better prediction, variable selection and clustering performance when cluster structure is present.

coefficient, hyperparameter, vc-pcr, (17 more...)

arXiv.org Machine Learning

2202.00975

Country:

Asia > Middle East > Jordan (0.04)
Europe > Belgium (0.04)
Europe > Germany (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
(2 more...)

Add feedback

Data Science 2021: Data Science & Machine Learning in Python

#artificialintelligenceFeb-1-2022, 05:00:47 GMT

Welcome to the most complete course on learning Data Science and Machine Learning on the internet! After teaching over 2 million students I've worked

data science, machine learning, neural network, (11 more...)

#artificialintelligence

Country: North America > United States (0.16)

Genre: Instructional Material (0.33)

Industry:

Banking & Finance (0.33)
Government > Regional Government (0.31)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.56)

Add feedback

A selective review of sufficient dimension reduction for multivariate response regression

Dong, Yuexiao, Soale, Abdul-Nasah, Power, Michael D.

arXiv.org Machine LearningFeb-1-2022

We review sufficient dimension reduction (SDR) estimators with multivariate response in this paper. A wide range of SDR methods are characterized as inverse regression SDR estimators or forward regression SDR estimators. The inverse regression family include pooled marginal estimators, projective resampling estimators, and distance-based estimators. Ordinary least squares, partial least squares, and semiparametric SDR estimators, on the other hand, are discussed as estimators from the forward regression family.

dimension reduction, estimator, regression, (13 more...)

arXiv.org Machine Learning

2202.00876

Country:

Europe > Spain > Aragón (0.04)
North America > United States > New York (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
(3 more...)

Genre:

Overview (0.67)
Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (0.75)

Add feedback

Hierarchical Shrinkage: improving the accuracy and interpretability of tree-based methods

Agarwal, Abhineet, Tan, Yan Shuo, Ronen, Omer, Singh, Chandan, Yu, Bin

arXiv.org Machine LearningFeb-1-2022

Tree-based models such as decision trees and random forests (RF) are a cornerstone of modern machine-learning practice. To mitigate overfitting, trees are typically regularized by a variety of techniques that modify their structure (e.g. pruning). We introduce Hierarchical Shrinkage (HS), a post-hoc algorithm that does not modify the tree structure, and instead regularizes the tree by shrinking the prediction over each node towards the sample means of its ancestors. The amount of shrinkage is controlled by a single regularization parameter and the number of data points in each ancestor. Since HS is a post-hoc method, it is extremely fast, compatible with any tree growing algorithm, and can be used synergistically with other regularization techniques. Extensive experiments over a wide variety of real-world datasets show that HS substantially increases the predictive performance of decision trees, even when used in conjunction with other regularization techniques. Moreover, we find that applying HS to each tree in an RF often improves accuracy, as well as its interpretability by simplifying and stabilizing its decision boundaries and SHAP values. We further explain the success of HS in improving prediction performance by showing its equivalence to ridge regression on a (supervised) basis constructed of decision stumps associated with the internal nodes of a tree. All code and models are released in a full-fledged package available on Github (github.com/csinva/imodels)

dataset, leaves number, regression, (15 more...)

arXiv.org Machine Learning

2202.00858

Country:

North America > United States > California > Alameda County > Berkeley (0.04)
South America > Paraguay > Asunción > Asunción (0.04)
North America > United States > New York (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area (0.71)
Health & Medicine > Pharmaceuticals & Biotechnology (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)

Add feedback

Safe Screening for Logistic Regression with $\ell_0$-$\ell_2$ Regularization

Deza, Anna, Atamturk, Alper

arXiv.org Machine LearningFeb-1-2022

In logistic regression, it is often desirable to utilize regularization to promote sparse solutions, particularly for problems with a large number of features compared to available labels. In this paper, we present screening rules that safely remove features from logistic regression with $\ell_0-\ell_2$ regularization before solving the problem. The proposed safe screening rules are based on lower bounds from the Fenchel dual of strong conic relaxations of the logistic regression problem. Numerical experiments with real and synthetic data suggest that a high percentage of the features can be effectively and safely removed apriori, leading to substantial speed-up in the computations.

optimal solution, regression, screening rule, (12 more...)

arXiv.org Machine Learning

2202.00467

Country: North America > United States > California > Alameda County > Berkeley (0.14)

Genre: Research Report > New Finding (0.79)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

#005 PyTorch - Logistic Regression in PyTorch - Master Data Science

#artificialintelligenceJan-31-2022, 20:33:41 GMT

The first step is to create a class called LogisticRegression(). We will pass torch. Then we will define a linear layer that will be the same as in the linear regression. So we will call the torch.nn.Linear() function. This function takes two input parameters. The first one is the size of each input sample which in this case will be equal to 2. The second parameter is the shape of the output which will be equal to 1. Next, we will create the forward() function which will take self and x as inputs.

logistic regression, master data science, pytorch

#artificialintelligence

Genre:

Research Report > New Finding (0.40)
Research Report > Experimental Study (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.75)

Add feedback

Complete Roadmap To Learn Machine Learning In Just 3 Months

#artificialintelligenceJan-31-2022, 06:30:33 GMT

But, at last, it is up to you, which cloud platform do you want to learn? But, don't forget to get the proper knowledge about the cloud platform

algorithm, exploratory data analysis, machine learning, (11 more...)

#artificialintelligence

Genre: Instructional Material (0.34)

Industry: Information Technology > Services (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.32)

Add feedback

Provably Improving Expert Predictions with Conformal Prediction

Straitouri, Eleni, Wang, Lequn, Okati, Nastaran, Rodriguez, Manuel Gomez

arXiv.org Machine LearningJan-31-2022

Automated decision support systems promise to help human experts solve tasks more efficiently and accurately. However, existing systems typically require experts to understand when to cede agency to the system or when to exercise their own agency. Moreover, if the experts develop a misplaced trust in the system, their performance may worsen. In this work, we lift the above requirement and develop automated decision support systems that, by design, do not require experts to understand when to trust them to provably improve their performance. To this end, we focus on multiclass classification tasks and consider automated decision support systems that, for each data sample, use a classifier to recommend a subset of labels to a human expert. We first show that, by looking at the design of such systems from the perspective of conformal prediction, we can ensure that the probability that the recommended subset of labels contains the true label matches almost exactly a target probability value. Then, we identify the set of target probability values under which the human expert is provably better off predicting a label among those in the recommended subset and develop an efficient practical method to find a near-optimal target probability value. Experiments on synthetic and real data demonstrate that our system can help the experts make more accurate predictions and is robust to the accuracy of the classifier it relies on.

classifier, prediction task, probability, (13 more...)

arXiv.org Machine Learning

2201.12006

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine (0.46)

Technology: