Regression
Accuracy versus interpretability? With generalized additive models (GAMs), you can have both
In this post, I will provide an overview of generalized additive models (GAMs) and their desirable features. Predictive accuracy has long been an important goal of machine learning. But model interpretability has received more attention in recent years. Stakeholders, such as executives, regulators, and domain experts, often want to understand how and why a model makes its predictions before they trust it enough to use it in practice. However, when you train a machine learning model, you typically face a tradeoff between accuracy and interpretability.
Stacked ensembles -- improving model performance on a higher level
One particular problem arises at this stage. How do we know what model to choose as the meta-model? Unfortunately, there has't been any research into this area and the selection of a meta-model is more of an art than a science. In most of the papers discussing stacked models, the meta-model used is often just a simple model such as Linear Regression for regression tasks and Logistic Regression for classification tasks. One reason why more complex meta-models are often not chosen is because there is a much higher chance that the meta-model may overfit to the predictions from the base models.
All About Logistic Regression
Originally published on Towards AI the World's Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses. Logistic Regression is a Supervised Machine Learning algorithm that is used in classification problems where we have to distinguish the dependent variable between two or more categories or classes by using the independent variables.
Adaptive Verifiable Coded Computing: Towards Fast, Secure and Private Distributed Machine Learning
Tang, Tingting, Ali, Ramy E., Hashemi, Hanieh, Gangwani, Tynan, Avestimehr, Salman, Annavaram, Murali
Stragglers, Byzantine workers, and data privacy are the main bottlenecks in distributed cloud computing. Some prior works proposed coded computing strategies to jointly address all three challenges. They require either a large number of workers, a significant communication cost or a significant computational complexity to tolerate Byzantine workers. Much of the overhead in prior schemes comes from the fact that they tightly couple coding for all three problems into a single framework. In this paper, we propose Adaptive Verifiable Coded Computing (AVCC) framework that decouples the Byzantine node detection challenge from the straggler tolerance. AVCC leverages coded computing just for handling stragglers and privacy, and then uses an orthogonal approach that leverages verifiable computing to mitigate Byzantine workers. Furthermore, AVCC dynamically adapts its coding scheme to trade-off straggler tolerance with Byzantine protection. We evaluate AVCC on a compute-intensive distributed logistic regression application. Our experiments show that AVCC achieves up to $4.2\times$ speedup and up to $5.1\%$ accuracy improvement over the state-of-the-art Lagrange coded computing approach (LCC). AVCC also speeds up the conventional uncoded implementation of distributed logistic regression by up to $7.6\times$, and improves the test accuracy by up to $12.1\%$.
Top Machine Learning Algorithms for Regression
Linear regression finds the optimal linear relationship between independent variables and dependent variables, thus makes prediction accordingly. The simplest form is y b0 b1x. When there is only one input feature, linear regression model fits the line in a 2 dimensional space, in order to minimize the residuals between predicted values and actual values. The common cost function to measure the magnitude of residuals is residual sum of squared (RSS). As more features are introduced, simple linear regression evolves into multiple linear regression y b0 b1x1 b2x2 … bnxn. Feel free to check out my article if you want the specific guide to simple linear regression model.
When regression coefficients change over time: A proposal
A common approach in forecasting problems is to estimate a least-squares regression (or other statistical learning models) from past data, which is then applied to predict future outcomes. An underlying assumption is that the same correlations that were observed in the past still hold for the future. We propose a model for situations when this assumption is not met: adopting methods from the state space literature, we model how regression coefficients change over time. Our approach can shed light on the large uncertainties associated with forecasting the future, and how much of this is due to changing dynamics of the past. Our simulation study shows that accurate estimates are obtained when the outcome is continuous, but the procedure fails for binary outcomes.
Logistic Regression with Example
Logistic Regression is a Supervised Machine Learning Algorithm utilized for classification. Examples for classification include: Email spam or ham, will buy or not buy a product, disease predictions such as cancerous or noncancerous cells. Logistic regression is a Probability problem. Meaning that the outcome of the algorithm is between 0 and 1. It maintains a threshold value to classify the data points(samples).
GAM(L)A: An econometric model for interpretable Machine Learning
Flachaire, Emmanuel, Hacheme, Gilles, Hué, Sullivan, Laurent, Sébastien
Despite their high predictive performance, random forest and gradient boosting are often considered as black boxes or uninterpretable models which has raised concerns from practitioners and regulators. As an alternative, we propose in this paper to use partial linear models that are inherently interpretable. Specifically, this article introduces GAM-lasso (GAMLA) and GAM-autometrics (GAMA), denoted as GAM(L)A in short. GAM(L)A combines parametric and non-parametric functions to accurately capture linearities and non-linearities prevailing between dependent and explanatory variables, and a variable selection procedure to control for overfitting issues. Estimation relies on a two-step procedure building upon the double residual method. We illustrate the predictive performance and interpretability of GAM(L)A on a regression and a classification problem. The results show that GAM(L)A outperforms parametric models augmented by quadratic, cubic and interaction effects. Moreover, the results also suggest that the performance of GAM(L)A is not significantly different from that of random forest and gradient boosting.
Stability and Risk Bounds of Iterative Hard Thresholding
In this paper, we analyze the generalization performance of the Iterative Hard Thresholding (IHT) algorithm widely used for sparse recovery problems. The parameter estimation and sparsity recovery consistency of IHT has long been known in compressed sensing. From the perspective of statistical learning, another fundamental question is how well the IHT estimation would predict on unseen data. This paper makes progress towards answering this open question by introducing a novel sparse generalization theory for IHT under the notion of algorithmic stability. Our theory reveals that: 1) under natural conditions on the empirical risk function over $n$ samples of dimension $p$, IHT with sparsity level $k$ enjoys an $\mathcal{\tilde O}(n^{-1/2}\sqrt{k\log(n)\log(p)})$ rate of convergence in sparse excess risk; 2) a tighter $\mathcal{\tilde O}(n^{-1/2}\sqrt{\log(n)})$ bound can be established by imposing an additional iteration stability condition on a hypothetical IHT procedure invoked to the population risk; and 3) a fast rate of order $\mathcal{\tilde O}\left(n^{-1}k(\log^3(n)+\log(p))\right)$ can be derived for strongly convex risk function under proper strong-signal conditions. The results have been substantialized to sparse linear regression and sparse logistic regression models to demonstrate the applicability of our theory. Preliminary numerical evidence is provided to confirm our theoretical predictions.
How to Interpret Machine Learning Models with Python -- Part 1 (easy)
In this article, I will try to interpret the Linear Regression, Lasso, and Decision Tree models which are inherently interpretable. I will analyze global interpretability -- which analyzes the most important feature for prediction in general and local interpretability -- which explains individual prediction results. Machine learning models are used in applications such as fraud and risk detection in bank transactions, voice assistants, recommendation systems, chatbots, self-driving cars, social network analysis, etc. However, sometimes it is difficult to interpret them because the algorithm represents a black box(e.g. So we need additional techniques to analyze black box decisions.