Regression
Generalization Error Bounds for Multiclass Sparse Linear Classifiers
Levy, Tomer, Abramovich, Felix
We consider high-dimensional multiclass classification by sparse multinomial logistic regression. Unlike binary classification, in the multiclass setup one can think about an entire spectrum of possible notions of sparsity associated with different structural assumptions on the regression coefficients matrix. We propose a computationally feasible feature selection procedure based on penalized maximum likelihood with convex penalties capturing a specific type of sparsity at hand. In particular, we consider global sparsity, double row-wise sparsity, and low-rank sparsity, and show that with the properly chosen tuning parameters the derived plug-in classifiers attain the minimax generalization error bounds (in terms of misclassification excess risk) within the corresponding classes of multiclass sparse linear classifiers. The developed approach is general and can be adapted to other types of sparsity as well.
Scalable Estimation for Structured Additive Distributional Regression
Umlauf, Nikolaus, Seiler, Johannes, Wetscher, Mattias, Simon, Thorsten, Lang, Stefan, Klein, Nadja
Recently, fitting probabilistic models have gained importance in many areas but estimation of such distributional models with very large data sets is a difficult task. In particular, the use of rather complex models can easily lead to memory-related efficiency problems that can make estimation infeasible even on high-performance computers. We therefore propose a novel backfitting algorithm, which is based on the ideas of stochastic gradient descent and can deal virtually with any amount of data on a conventional laptop. The algorithm performs automatic selection of variables and smoothing parameters, and its performance is in most cases superior or at least equivalent to other implementations for structured additive distributional regression, e.g., gradient boosting, while maintaining low computation time. Performance is evaluated using an extensive simulation study and an exceptionally challenging and unique example of lightning count prediction over Austria. A very large dataset with over 9 million observations and 80 covariates is used, so that a prediction model cannot be estimated with standard distributional regression methods but with our new approach.
L2 Regularization: What It Is and How to Implement It in Python
L2 regularization is a method used to prevent overfitting in machine learning models. It adds a penalty term to the loss function that is proportional to the sum of the squares of the weights. This penalizes large weights and encourages the model to use only the most relevant features. L2 regularization is also known as weight decay because it causes the model's weight to decay toward zero. The penalty term is usually added to the loss function during training and is typically set by cross-validation.
Linear regression in detail. Linear regression is a statistical…
Linear regression is a statistical method for modeling the relationship between a dependent variable and one or more independent variables. It is a widely-used technique for predicting the outcome of a continuous variable, and it is especially useful when you have a large amount of data. In this blog post, we will discuss the theory behind linear regression, how to perform it in practice, and some of its applications. The basic idea behind linear regression is to find a line that best fits a set of data points. The line is represented by the equation y mx b, where y is the dependent variable, x is the independent variable, m is the slope of the line, and b is the y-intercept.
Benign Overfitting in Time Series Linear Model with Over-Parameterization
Nakakita, Shogo, Imaizumi, Masaaki
The success of large-scale models in recent years has increased the importance of statistical models with numerous parameters. Several studies have analyzed over-parameterized linear models with high-dimensional data that may not be sparse; however, existing results depend on the independent setting of samples. In this study, we analyze a linear regression model with dependent time series data under over-parameterization settings. We consider an estimator via interpolation and developed a theory for the excess risk of the estimator. Then, we derive bounds of risks by the estimator for the cases where the temporal correlation of each coordinate of dependent data is homogeneous and heterogeneous, respectively. The derived bounds reveal that a temporal covariance of the data plays a key role; its strength affects the bias of the risk, and its nondegeneracy affects the variance of the risk. Moreover, for the heterogeneous correlation case, we show that the convergence rate of risks with short-memory processes is identical to that of cases with independent data, and the risk can converge to zero even with long-memory processes. Our theory can be extended to infinite-dimensional data in a unified manner. We also present several examples of specific dependent processes that can be applied to our setting.
A Stochastic Optimization Framework for Fair Risk Minimization
Lowy, Andrew, Baharlouei, Sina, Pavan, Rakesh, Razaviyayn, Meisam, Beirami, Ahmad
Despite the success of large-scale empirical risk minimization (ERM) at achieving high accuracy across a variety of machine learning tasks, fair ERM is hindered by the incompatibility of fairness constraints with stochastic optimization. We consider the problem of fair classification with discrete sensitive attributes and potentially large models and data sets, requiring stochastic solvers. Existing in-processing fairness algorithms are either impractical in the large-scale setting because they require large batches of data at each iteration or they are not guaranteed to converge. In this paper, we develop the first stochastic in-processing fairness algorithm with guaranteed convergence. For demographic parity, equalized odds, and equal opportunity notions of fairness, we provide slight variations of our algorithm--called FERMI--and prove that each of these variations converges in stochastic optimization with any batch size. Empirically, we show that FERMI is amenable to stochastic solvers with multiple (non-binary) sensitive attributes and non-binary targets, performing well even with minibatch size as small as one. Extensive experiments show that FERMI achieves the most favorable tradeoffs between fairness violation and test accuracy across all tested setups compared with state-of-the-art baselines for demographic parity, equalized odds, equal opportunity. These benefits are especially significant with small batch sizes and for non-binary classification with large number of sensitive attributes, making FERMI a practical, scalable fairness algorithm. The code for all of the experiments in this paper is available at: https://github.com/optimization-for-data-driven-science/FERMI.
Optirank: classification for RNA-Seq data with optimal ranking reference genes
Malsot, Paola, Martins, Filipe, Trono, Didier, Obozinski, Guillaume
Classification algorithms using RNA-Sequencing (RNA-Seq) data as input are used in a variety of biological applications. By nature, RNA-Seq data is subject to uncontrolled fluctuations both within and especially across datasets, which presents a major difficulty for a trained classifier to generalize to an external dataset. Replacing raw gene counts with the rank of gene counts inside an observation has proven effective to mitigate this problem. However, the rank of a feature is by definition relative to all other features, including highly variable features that introduce noise in the ranking. To address this problem and obtain more robust ranks, we propose a logistic regression model, optirank, which learns simultaneously the parameters of the model and the genes to use as a reference set in the ranking. We show the effectiveness of this method on simulated data. We also consider real classification tasks, which present different kinds of distribution shifts between train and test data. Those tasks concern a variety of applications, such as cancer of unknown primary classification, identification of specific gene signatures, and determination of cell type in single-cell RNA-Seq datasets. On those real tasks, optirank performs at least as well as the vanilla logistic regression on classical ranks, while producing sparser solutions. In addition, to increase the robustness against dataset shifts, we propose a multi-source learning scheme and demonstrate its effectiveness when used in combination with rank-based classifiers.
Linear Regression Deep Understanding
In data science, machine learning algorithms are used to automate a system. In practice, there are mainly two types of problems -- i. Supervised, and ii. In the supervised problem, the training dataset is labelled. That means the algorithm has a target value. The supervised learning algorithm tries to predict the values like target values and optimizes its parameters accordingly.
Logistic Regression Math Deduction – Towards AI
Originally published on Towards AI the World's Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses. Logistic regression is a supervised machine learning algorithm to create models used for binary classification problems conventionally.
Introduction to Machine Learning: Logistic Regression - PythonAlgos
Is this another spam email? How does your email spam filter tell? Perhaps it uses a simple machine learning technique. In this post, we're going to learn about what it is and how we can create a Python logistic regression program. Unlike linear regression, logistic regression is used for classification rather than prediction along a continuous range.