Goto

Collaborating Authors

 Regression


Logistic Regression in One Picture

#artificialintelligence

Logistic regression is regressing data to a line (i.e. This type of regression is a good choice when modeling binary variables, which happen frequently in real life (e.g. The logistic regression model is popular, in part, because it gives probabilities between 0 and 1. Let's say you were modeling a risk of credit default: values closer to 0 indicate a tiny risk, while values closer to 1 mean a very high risk. The following image shows an example of how one might tailor a logistic model for credit score based risk.


Nonparametric Variable Screening with Optimal Decision Stumps

arXiv.org Machine Learning

Decision trees and their ensembles are endowed with a rich set of diagnostic tools for ranking and screening input variables in a predictive model. One of the most commonly used in practice is the Mean Decrease in Impurity (MDI), which calculates an importance score for a variable by summing the weighted impurity reductions over all non-terminal nodes split with that variable. Despite the widespread use of tree based variable importance measures such as MDI, pinning down their theoretical properties has been challenging and therefore largely unexplored. To address this gap between theory and practice, we derive rigorous finite sample performance guarantees for variable ranking and selection in nonparametric models with MDI for a single-level CART decision tree (decision stump). We find that the marginal signal strength of each variable and ambient dimensionality can be considerably weaker and higher, respectively, than state-of-the-art nonparametric variable selection methods. Furthermore, unlike previous marginal screening methods that attempt to directly estimate each marginal projection via a truncated basis expansion, the fitted model used here is a simple, parsimonious decision stump, thereby eliminating the need for tuning the number of basis terms. Thus, surprisingly, even though decision stumps are highly inaccurate for estimation purposes, they can still be used to perform consistent model selection.


DeepHazard: neural network for time-varying risks

arXiv.org Machine Learning

Prognostic models in survival analysis are aimed at understanding the relationship between patients' covariates and the distribution of survival time. Traditionally, semi-parametric models, such as the Cox model, have been assumed. These often rely on strong proportionality assumptions of the hazard that might be violated in practice. Moreover, they do not often include covariate information updated over time. We propose a new flexible method for survival prediction: DeepHazard, a neural network for time-varying risks. Our approach is tailored for a wide range of continuous hazards forms, with the only restriction of being additive in time. A flexible implementation, allowing different optimization methods, along with any norm penalty, is developed. Numerical examples illustrate that our approach outperforms existing state-of-the-art methodology in terms of predictive capability evaluated through the C-index metric. The same is revealed on the popular real datasets as METABRIC, GBSG, and ACTG.


Graph Enhanced High Dimensional Kernel Regression

arXiv.org Machine Learning

In this paper, the flexibility, versatility and predictive power of kernel regression are combined with now lavishly available network data to create regression models with even greater predictive performances. Building from previous work featuring generalized linear models built in the presence of network cohesion data, we construct a kernelized extension that captures subtler nonlinearities in extremely high dimensional spaces and also produces far better predictive performances. Applications of seamless yet substantial adaptation to simulated and real-life data demonstrate the appeal and strength of our work.


A Scalable Approach for Privacy-Preserving Collaborative Machine Learning

arXiv.org Machine Learning

Machine learning applications can achieve significant performance gains by training on large volumes of data. In many applications, the training data is distributed across multiple data-owners, such as patient records at multiple medical institutions, and furthermore contains sensitive information, e.g., genetic information, financial transactions, and geolocation information. Such settings give rise to the following key problem that is the focus of this paper: How can multiple data-owners jointly train a machine learning model while keeping their individual datasets private from the other parties? More specifically, we consider a distributed learning scenario in which N data-owners (clients) wish to train a logistic regression model jointly without revealing information about their individual datasets to the other parties, even if up to T out of N clients collude. Our focus is on the semi-honest adversary setup, where the corrupted parties follow the protocol but may leak information in an attempt to learn the training dataset.


Support estimation in high-dimensional heteroscedastic mean regression

arXiv.org Machine Learning

A current strand of research in high-dimensional statistics deals with robustifying the available methodology with respect to deviations from the pervasive light-tail assumptions. In this paper we consider a linear mean regression model with random design and potentially heteroscedastic, heavy-tailed errors, and investigate support estimation in this framework. We use a strictly convex, smooth variant of the Huber loss function with tuning parameter depending on the parameters of the problem, as well as the adaptive LASSO penalty for computational efficiency. For the resulting estimator we show sign-consistency and optimal rates of convergence in the $\ell_\infty$ norm as in the homoscedastic, light-tailed setting. In our analysis, we have to deal with the issue that the support of the target parameter in the linear mean regression model and its robustified version may differ substantially even for small values of the tuning parameter of the Huber loss function. Simulations illustrate the favorable numerical performance of the proposed methodology.


A Non Mathematical guide to the mathematics behind Machine Learning

#artificialintelligence

This model finds the "best fit" line through a set of data points by using a simple formula. The variable you want to predict (the dependent variable) is represented as an equation of variables you know (independent variables). The prediction can be obtained through the outcome of the equation by inputting the independent variables, and having the equation provide the answer. The main categories of Linear models used are Linear Regression and Logistic Regression. Linear Regression is used for predicting numerical values using the "best fit" line through all data points.


How to Use Stacking to Choose the Best Possible Algorithm?

#artificialintelligence

This article was published as a part of the Data Science Blogathon. Every time you stumble upon a huge volume of data with thousands of features, you will be wondering what would be the best algorithm to get accurate predictions on this data, and whether to use all the features or reduce the feature space. Through this blog, I will take you through the steps in finding the good features through lasso regression and getting the right algorithm through a technique called stacking. Stacking refers to a method of joining the machine learning models, similar to arranging a stack of plates at a restaurant. It combines the output of many models.


Ridge regression with adaptive additive rectangles and other piecewise functional templates

arXiv.org Machine Learning

We propose an $L_{2}$-based penalization algorithm for functional linear regression models, where the coefficient function is shrunk towards a data-driven shape template $\gamma$, which is constrained to belong to a class of piecewise functions by restricting its basis expansion. In particular, we focus on the case where $\gamma$ can be expressed as a sum of $q$ rectangles that are adaptively positioned with respect to the regression error. As the problem of finding the optimal knot placement of a piecewise function is nonconvex, the proposed parametrization allows to reduce the number of variables in the global optimization scheme, resulting in a fitting algorithm that alternates between approximating a suitable template and solving a convex ridge-like problem. The predictive power and interpretability of our method is shown on multiple simulations and two real world case studies.


c-lasso -- a Python package for constrained sparse and robust regression and classification

arXiv.org Machine Learning

We introduce c-lasso, a Python package that enables sparse and robust linear regression and classification with linear equality constraints. The underlying statistical forward model is assumed to be of the following form: \[ y = X \beta + \sigma \epsilon \qquad \textrm{subject to} \qquad C\beta=0 \] Here, $X \in \mathbb{R}^{n\times d}$is a given design matrix and the vector $y \in \mathbb{R}^{n}$ is a continuous or binary response vector. The matrix $C$ is a general constraint matrix. The vector $\beta \in \mathbb{R}^{d}$ contains the unknown coefficients and $\sigma$ an unknown scale. Prominent use cases are (sparse) log-contrast regression with compositional data $X$, requiring the constraint $1_d^T \beta = 0$ (Aitchion and Bacon-Shone 1984) and the Generalized Lasso which is a special case of the described problem (see, e.g, (James, Paulson, and Rusmevichientong 2020), Example 3). The c-lasso package provides estimators for inferring unknown coefficients and scale (i.e., perspective M-estimators (Combettes and M\"uller 2020a)) of the form \[ \min_{\beta \in \mathbb{R}^d, \sigma \in \mathbb{R}_{0}} f\left(X\beta - y,{\sigma} \right) + \lambda \left\lVert \beta\right\rVert_1 \qquad \textrm{subject to} \qquad C\beta = 0 \] for several convex loss functions $f(\cdot,\cdot)$. This includes the constrained Lasso, the constrained scaled Lasso, and sparse Huber M-estimators with linear equality constraints.