Goto

Collaborating Authors

 Regression


Statistical optimality and stability of tangent transform algorithms in logit models

arXiv.org Machine Learning

A systematic approach to finding variational approximation in an otherwise intractable non-conjugate model is to exploit the general principle of convex duality by minorizing the marginal likelihood that renders the problem tractable. While such approaches are popular in the context of variational inference in non-conjugate Bayesian models, theoretical guarantees on statistical optimality and algorithmic convergence are lacking. Focusing on logistic regression models, we provide mild conditions on the data generating process to derive non-asymptotic upper bounds to the risk incurred by the variational optima. We demonstrate that these assumptions can be completely relaxed if one considers a slight variation of the algorithm by raising the likelihood to a fractional power. Next, we utilize the theory of dynamical systems to provide convergence guarantees for such algorithms in logistic and multinomial logit regression. In particular, we establish local asymptotic stability of the algorithm without any assumptions on the data-generating process. We explore a special case involving a semi-orthogonal design under which a global convergence is obtained. The theory is further illustrated using several numerical studies.


The Simpler Brother of OLS Regression for Machine Learning

#artificialintelligence

Nonparametric's took me a while to get my head around. On the one hand, all I had ever studied involved making the formulae of a predictive model differentiable and optimizing in regards to the individual or set of parameters (think linear regression or GMM). On the other hand, the majority of the nonparametric methods were being used in classification (Random Forests, KNN, etc). But some of the best methods are nonparametric. They do not assume a particular family of distributions and try to select the best-fit ones, they make judgments without assuming a distribution. Keep up to date with my latest articles here!


Deep Learning Prerequisites: Linear Regression in Python

#artificialintelligence

Online Courses Udemy Data science: Learn linear regression from scratch and build your own working program in Python for data analysis. Created by Lazy Programmer Inc. English [Auto-generated], Spanish [Auto-generated] Students also bought Artificial Intelligence: Reinforcement Learning in Python Data Science: Natural Language Processing (NLP) in Python Natural Language Processing with Deep Learning in Python Cluster Analysis and Unsupervised Machine Learning in Python Complete Python Bootcamp: Go from zero to hero in Python 3 Preview this course GET COUPON CODE Description This course teaches you about one popular technique used in machine learning, data science and statistics: linear regression. We cover the theory from the ground up: derivation of the solution, and applications to real-world problems. We show you how one might code their own linear regression module in Python. Linear regression is the simplest machine learning model you can learn, yet there is so much depth that you'll be returning to it for years to come.


Machine Learning Algorithms from Start to Finish in Python: Logistic Regression

#artificialintelligence

Going back to our example, let's assume that the Lakers were having a terrible season(clearly not the case), and out of 20 games, they only won 1. so the odds to the Lakers winning would be: We can make a simple observation: the worse they play, the more close their odds of winning will be to 0. Concretely, when the odds are against them winning, then the odds will range between 0 and 1. Now let's look at the opposite. In other words, when the odds are for the Lakers winning, they begin at 1 and they can go all the way up to infinity. Clearly, there is a problem here. This asymmetry makes it hard to compare the odds for or against Lakers winning.


A Complete Guide to Linear Regression for Beginners - Let's Discuss Stuff

#artificialintelligence

Linear Regression is the most simple, easily understandable, and widely used supervised regression model. In supervised learning, you have an input-output pair. And you will try to map the given input to output by training the input-output pair. Another type of machine learning algorithm is unsupervised learning, in this, you don't have an output variable. You will try to group the input variables by their similarities.


Machine Learning Algorithms from Start to Finish in Python: Linear Regression

#artificialintelligence

Probably one of the most common algorithms around, Linear Regression is a must know for Machine Learning Practitioners. This is usually a beginner's first exposure to a real Machine Learning algorithm, and knowing how it operates on a deeper level is crucial to gain a better understanding of it. So, briefly, let's break down the real question; What really is Linear Regression? Linear Regression is a supervised learning algorithm that aims at taking a linear approach at modelling the relation between a dependent variable and an independent variable. In other words, It aims to fit a linear trendline that best captures the relationship of the data, and, from this line, it can predict what the target values may be.


Distribution Regression for Sequential Data

arXiv.org Machine Learning

Distribution regression refers to the supervised learning problem where labels are only available for groups of inputs instead of individual inputs. In this paper, we develop a rigorous mathematical framework for distribution regression where inputs are complex data streams. Leveraging properties of the expected signature and a recent signature kernel trick for sequential data from stochastic analysis, we introduce two new learning techniques, one feature-based and the other kernel-based. Each is suited to a different data regime in terms of the number of data streams and the dimensionality of the individual streams. We provide theoretical results on the universality of both approaches and demonstrate empirically their robustness to irregularly sampled multivariate time-series, achieving state-of-the-art performance on both synthetic and real-world examples from thermodynamics, mathematical finance and agricultural science.


Off-Policy Evaluation of Bandit Algorithm from Dependent Samples under Batch Update Policy

arXiv.org Machine Learning

As an instance of sequential decision-making problems, the multi-armed bandit (MAB) algorithms have attracted significant attention in various applications, such as ad optimization, personalized medicine, search engines, and recommendation systems. Recently, various methods for evaluating a new policy using historical data obtained via the MAB algorithms (Beygelzimer & Langford, 2009; Li et al., 2010) have emerged. The goal of off-policy evaluation (OPE) is to evaluate a new policy by estimating the expected reward obtained from the new policy (Dudík et al., 2011; Wang et al., 2017; Narita et al., 2019; Bibaut et al., 2019; Kallus & Uehara, 2019; Oberst & Sontag, 2019). Although an OPE algorithm estimates the expected reward from a new policy, most existing studies presume that the samples are independent and identically distributed (i.i.d.). However, the MAB algorithm policy updates the probability of choosing an action based on past observations, and samples are not i.i.d.


A Practical Guide of Off-Policy Evaluation for Bandit Problems

arXiv.org Machine Learning

Off-policy evaluation (OPE) is the problem of estimating the value of a target policy from samples obtained via different policies. Recently, applying OPE methods for bandit problems has garnered attention. For the theoretical guarantees of an estimator of the policy value, the OPE methods require various conditions on the target policy and policy used for generating the samples. However, existing studies did not carefully discuss the practical situation where such conditions hold, and the gap between them remains. This paper aims to show new results for bridging the gap. Based on the properties of the evaluation policy, we categorize OPE situations. Then, among practical applications, we mainly discuss the best policy selection. For the situation, we propose a meta-algorithm based on existing OPE estimators. We investigate the proposed concepts using synthetic and open real-world datasets in experiments.


On the Universality of the Double Descent Peak in Ridgeless Regression

arXiv.org Machine Learning

We prove a non-asymptotic distribution-independent lower bound for the expected mean squared generalization error caused by label noise in ridgeless linear regression. Our lower bound generalizes a similar known result to the overparameterized (interpolating) regime. In contrast to most previous works, our analysis applies to a broad class of input distributions with almost surely full-rank feature matrices, which allows us to cover various types of deterministic or random feature maps. Our lower bound is asymptotically sharp and implies that in the presence of label noise, ridgeless linear regression does not perform well around the interpolation threshold for any of these feature maps. We analyze the imposed assumptions in detail and provide a theory for analytic (random) feature maps. Using this theory, we can show that our assumptions are satisfied for input distributions with a (Lebesgue) density and feature maps given by random deep neural networks with analytic activation functions like sigmoid, tanh, softplus or GELU. As further examples, we show that feature maps from random Fourier features and polynomial kernels also satisfy our assumptions. We complement our theory with further experimental and analytic results.