Goto

Collaborating Authors

 Regression


Sub-sampled Newton Methods with Non-uniform Sampling

arXiv.org Machine Learning

We consider the problem of finding the minimizer of a convex function $F: \mathbb R^d \rightarrow \mathbb R$ of the form $F(w) := \sum_{i=1}^n f_i(w) + R(w)$ where a low-rank factorization of $\nabla^2 f_i(w)$ is readily available. We consider the regime where $n \gg d$. As second-order methods prove to be effective in finding the minimizer to a high-precision, in this work, we propose randomized Newton-type algorithms that exploit \textit{non-uniform} sub-sampling of $\{\nabla^2 f_i(w)\}_{i=1}^{n}$, as well as inexact updates, as means to reduce the computational complexity. Two non-uniform sampling distributions based on {\it block norm squares} and {\it block partial leverage scores} are considered in order to capture important terms among $\{\nabla^2 f_i(w)\}_{i=1}^{n}$. We show that at each iteration non-uniformly sampling at most $\mathcal O(d \log d)$ terms from $\{\nabla^2 f_i(w)\}_{i=1}^{n}$ is sufficient to achieve a linear-quadratic convergence rate in $w$ when a suitable initial point is provided. In addition, we show that our algorithms achieve a lower computational complexity and exhibit more robustness and better dependence on problem specific quantities, such as the condition number, compared to similar existing methods, especially the ones based on uniform sampling. Finally, we empirically demonstrate that our methods are at least twice as fast as Newton's methods with ridge logistic regression on several real datasets.


A Residual Bootstrap for High-Dimensional Regression with Near Low-Rank Designs

arXiv.org Machine Learning

We study the residual bootstrap (RB) method in the context of high-dimensional linear regression. Specifically, we analyze the distributional approximation of linear contrasts $c^{\top} (\hat{\beta}_{\rho}-\beta)$, where $\hat{\beta}_{\rho}$ is a ridge-regression estimator. When regression coefficients are estimated via least squares, classical results show that RB consistently approximates the laws of contrasts, provided that $p\ll n$, where the design matrix is of size $n\times p$. Up to now, relatively little work has considered how additional structure in the linear model may extend the validity of RB to the setting where $p/n\asymp 1$. In this setting, we propose a version of RB that resamples residuals obtained from ridge regression. Our main structural assumption on the design matrix is that it is nearly low rank --- in the sense that its singular values decay according to a power-law profile. Under a few extra technical assumptions, we derive a simple criterion for ensuring that RB consistently approximates the law of a given contrast. We then specialize this result to study confidence intervals for mean response values $X_i^{\top} \beta$, where $X_i^{\top}$ is the $i$th row of the design. More precisely, we show that conditionally on a Gaussian design with near low-rank structure, RB simultaneously approximates all of the laws $X_i^{\top}(\hat{\beta}_{\rho}-\beta)$, $i=1,\dots,n$. This result is also notable as it imposes no sparsity assumptions on $\beta$. Furthermore, since our consistency results are formulated in terms of the Mallows (Kantorovich) metric, the existence of a limiting distribution is not required.


On the complexity of switching linear regression

arXiv.org Machine Learning

This technical note extends recent results on the computational complexity of globally minimizing the error of piecewise-affine models to the related problem of minimizing the error of switching linear regression models. In particular, we show that, on the one hand the problem is NP-hard, but on the other hand, it admits a polynomial-time algorithm with respect to the number of data points for any fixed data dimension and number of modes.


What is Softmax Regression and How is it Related to Logistic Regression?

#artificialintelligence

Softmax Regression (synonyms: Multinomial Logistic, Maximum Entropy Classifier, or just Multi-class Logistic Regression) is a generalization of logistic regression that we can use for multi-class classification (under the assumption that the classes are mutually exclusive). In contrast, we use the (standard) Logistic Regression model in binary classification tasks. Now, let me briefly explain how that works and how softmax regression differs from logistic regression. As the name suggests, in softmax regression (SMR), we replace the sigmoid logistic function by the so-called softmax function?: Now, this softmax function computes the probability that this training sample x(i) belongs to class j given the weight and net input z(i). So, we compute the probability p(y j x(i); wj) for each class label in j 1, ..., k.


A multilevel framework for sparse optimization with application to inverse covariance estimation and logistic regression

arXiv.org Machine Learning

Solving l1 regularized optimization problems is common in the fields of computational biology, signal processing and machine learning. Such l1 regularization is utilized to find sparse minimizers of convex functions. A well-known example is the LASSO problem, where the l1 norm regularizes a quadratic function. A multilevel framework is presented for solving such l1 regularized sparse optimization problems efficiently. We take advantage of the expected sparseness of the solution, and create a hierarchy of problems of similar type, which is traversed in order to accelerate the optimization process. This framework is applied for solving two problems: (1) the sparse inverse covariance estimation problem, and (2) l1-regularized logistic regression. In the first problem, the inverse of an unknown covariance matrix of a multivariate normal distribution is estimated, under the assumption that it is sparse. To this end, an l1 regularized log-determinant optimization problem needs to be solved. This task is challenging especially for large-scale datasets, due to time and memory limitations. In the second problem, the l1-regularization is added to the logistic regression classification objective to reduce overfitting to the data and obtain a sparse model. Numerical experiments demonstrate the efficiency of the multilevel framework in accelerating existing iterative solvers for both of these problems.


KDnuggets News 16:n23, Jun 29: Machine Learning Trends & Future of AI; Data Science Kaggle Walkthrough; Regularization in Logistic Regression

#artificialintelligence

Doing Data Science: A Kaggle Walkthrough Part 6 - Creating a Model Top Machine Learning Libraries for Javascript Improving Nudity Detection and NSFW Image Recognition History of Data Mining Predictive Analytics World in October: Government, Business, Financial, Healthcare Software 5 More Machine Learning Projects You Can No Longer Overlook BigDebug: Debugging Primitives for Interactive Big Data Processing in Spark Achieving End-to-end Security for Apache Spark with Databricks Predicting purchases at retail stores using HPE Vertica and Dataiku DSS Tutorials, Overviews, How-Tos Mining Twitter Data with Python Part 4: Rugby and Term Co-occurrences Ten Simple Rules for Effective Statistical Practice: An Overview Mining Twitter Data with Python Part 3: Term Frequencies Opinions The Big Data Ecosystem is Too Damn Big An Inside Update on Natural Language Processing From Research to Riches: Data Wrangling Lessons from Physical and Life Science News Top Stories, June 20-26: New Machine Learning Book, Free Draft Chapters; Machine Learning Trends & Future of A.I. Webcasts and Webinars Webinar, Jun 30: Introducing Anaconda Mosaic: Visualize. Bank of Ireland: Senior Data Scientist within the Advanced Analytics Team DuPont Pioneer: Data Scientist - Encirca Academic U. of Iowa: Business Analytics & Information Systems, Lecturer U. of Iowa: Lecturer: Business Analytics & Information Systems Top Tweets Top KDnuggets tweets, Jun 15-21: Predicting UEFA Euro2016; Visual Explanation of Backprop for Neural Nets Quote "Everything at scale in this world is going to be managed by algorithms and data ... every business will be an algorithmic business."


1.1. Generalized Linear Models -- scikit-learn 0.17.1 documentation

#artificialintelligence

Logistic regression, despite its name, is a linear model for classification rather than regression. Logistic regression is also known in the literature as logit regression, maximum-entropy classification (MaxEnt) or the log-linear classifier. In this model, the probabilities describing the possible outcomes of a single trial are modeled using a logistic function. The implementation of logistic regression in scikit-learn can be accessed from class LogisticRegression. This implementation can fit a multiclass (one-vs-rest) logistic regression with optional L2 or L1 regularization.


Bootstrap-Based Regularization for Low-Rank Matrix Estimation

arXiv.org Machine Learning

We develop a flexible framework for low-rank matrix estimation that allows us to transform noise models into regularization schemes via a simple bootstrap algorithm. Effectively, our procedure seeks an autoencoding basis for the observed matrix that is stable with respect to the specified noise model; we call the resulting procedure a stable autoencoder. In the simplest case, with an isotropic noise model, our method is equivalent to a classical singular value shrinkage estimator. For non-isotropic noise models--e.g., Poisson noise-- the method does not reduce to singular value shrinkage, and instead yields new estimators that perform well in experiments. Moreover, by iterating our stable autoencoding scheme, we can automatically generate low-rank estimates without specifying the target rank as a tuning parameter.


A Learning Algorithm for Relational Logistic Regression: Preliminary Results

arXiv.org Machine Learning

Relational logistic regression (RLR) is a representation of conditional probability in terms of weighted formulae for modelling multi-relational data. In this paper, we develop a learning algorithm for RLR models. Learning an RLR model from data consists of two steps: 1- learning the set of formulae to be used in the model (a.k.a. structure learning) and learning the weight of each formula (a.k.a. parameter learning). For structure learning, we deploy Schmidt and Murphy's hierarchical assumption: first we learn a model with simple formulae, then more complex formulae are added iteratively only if all their sub-formulae have proven effective in previous learned models. For parameter learning, we convert the problem into a non-relational learning problem and use an off-the-shelf logistic regression learning algorithm from Weka, an open-source machine learning tool, to learn the weights. We also indicate how hidden features about the individuals can be incorporated into RLR to boost the learning performance. We compare our learning algorithm to other structure and parameter learning algorithms in the literature, and compare the performance of RLR models to standard logistic regression and RDN-Boost on a modified version of the MovieLens data-set.


Regularization in Logistic Regression: Better Fit and Better Generalization?

#artificialintelligence

Regularization does NOT improve the performance on the data set that the algorithm used to learn the model parameters (feature weights). However, it can improve the generalization performance, i.e., the performance on new, unseen data, which is exactly what we want. In intuitive terms, we can think of regularization as a penalty against complexity. Increasing the regularization strength penalizes "large" weight coefficients -- our goal is to prevent that our model picks up "peculiarities," "noise," or "imagines a pattern where there is none." Again, we don't want the model to memorize the training dataset, we want a model that generalizes well to new, unseen data. In more specific terms, we can think of regularization as adding (or increasing the) bias if our model suffers from (high) variance (i.e., it overfits the training data).