Goto

Collaborating Authors

 Regression


Estimating Individual Treatment Effects using Non-Parametric Regression Models: a Review

arXiv.org Machine Learning

Large observational data are increasingly available in disciplines such as health, economic and social sciences, where researchers are interested in causal questions rather than prediction. In this paper, we investigate the problem of estimating heterogeneous treatment effects using non-parametric regression-based methods. Firstly, we introduce the setup and the issues related to conducting causal inference with observational or non-fully randomized data, and how these issues can be tackled with the help of statistical learning tools. Then, we provide a review of state-of-the-art methods, with a particular focus on non-parametric modeling, and we cast them under a unifying taxonomy. After presenting a brief overview on the problem of model selection, we illustrate the performance of some of the methods on three different simulated studies and on a real world example to investigate the effect of participation in school meal programs on health indicators.


CP Degeneracy in Tensor Regression

arXiv.org Machine Learning

Tensor linear regression is an important and useful tool for analyzing tensor data. To deal with high dimensionality, CANDECOMP/PARAFAC (CP) low-rank constraints are often imposed on the coefficient tensor parameter in the (penalized) $M$-estimation. However, we show that the corresponding optimization may not be attainable, and when this happens, the estimator is not well-defined. This is closely related to a phenomenon, called CP degeneracy, in low-rank tensor approximation problems. In this article, we provide useful results of CP degeneracy in tensor regression problems. In addition, we provide a general penalized strategy as a solution to overcome CP degeneracy. The asymptotic properties of the resulting estimation are also studied. Numerical experiments are conducted to illustrate our findings.


Piecewise Linear Regression via a Difference of Convex Functions

arXiv.org Machine Learning

We present a new piecewise linear regression methodology that utilizes fitting a difference of convex functions (DC functions) to the data. These are functions $f$ that may be represented as the difference $\phi_1 - \phi_2$ for a choice of convex functions $\phi_1, \phi_2$. The method proceeds by estimating piecewise-liner convex functions, in a manner similar to max-affine regression, whose difference approximates the data. The choice of the function is regularised by a new seminorm over the class of DC functions that controls the $\ell_\infty$ Lipschitz constant of the estimate. The resulting methodology can be efficiently implemented via Quadratic programming even in high dimensions, and is shown to have close to minimax statistical risk. We empirically validate the method, showing it to be practically implementable, and to have comparable performance to existing regression/classification methods on real-world datasets.


Asymptotically Optimal Exact Minibatch Metropolis-Hastings

arXiv.org Machine Learning

Metropolis-Hastings (MH) is a commonly-used MCMC algorithm, but it can be intractable on large datasets due to requiring computations over the whole dataset. In this paper, we study minibatch MH methods, which instead use subsamples to enable scaling. We observe that most existing minibatch MH methods are inexact (i.e. they may change the target distribution), and show that this inexactness can cause arbitrarily large errors in inference. We propose a new exact minibatch MH method, TunaMH, which exposes a tunable tradeoff between its batch size and its theoretically guaranteed convergence rate. We prove a lower bound on the batch size that any minibatch MH method must use to retain exactness while guaranteeing fast convergence--the first such bound for minibatch MH--and show TunaMH is asymptotically optimal in terms of the batch size. Empirically, we show TunaMH outperforms other exact minibatch MH methods on robust linear regression, truncated Gaussian mixtures, and logistic regression.


Sharp Bias-variance Tradeoffs of Hard Parameter Sharing in High-dimensional Linear Regression

arXiv.org Machine Learning

Hard parameter sharing for multi-task learning is widely used in empirical research despite the fact that its generalization properties have not been well established in many cases. This paper studies its generalization properties in a fundamental setting: How does hard parameter sharing work given multiple linear regression tasks? We develop new techniques and establish a number of new results in the high-dimensional setting, where the sample size and feature dimension increase at a fixed ratio. First, we show a sharp bias-variance decomposition of hard parameter sharing, given multiple tasks with the same features. Second, we characterize the asymptotic bias-variance limit for two tasks, even when they have arbitrarily different sample size ratios and covariate shifts. We also demonstrate that these limiting estimates for the empirical loss are incredibly accurate in moderate dimensions. Finally, we explain an intriguing phenomenon where increasing one task's sample size helps another task initially by reducing variance but hurts eventually due to increasing bias. This suggests progressively adding data for optimizing hard parameter sharing, and we validate its efficiency in text classification tasks.


Spike and slab variational Bayes for high dimensional logistic regression

arXiv.org Machine Learning

Variational Bayes (VB) is a popular scalable alternative to Markov chain Monte Carlo for Bayesian inference. We study a mean-field spike and slab VB approximation of widely used Bayesian model selection priors in sparse high-dimensional logistic regression. We provide non-asymptotic theoretical guarantees for the VB posterior in both $\ell_2$ and prediction loss for a sparse truth, giving optimal (minimax) convergence rates. Since the VB algorithm does not depend on the unknown truth to achieve optimality, our results shed light on effective prior choices. We confirm the improved performance of our VB algorithm over common sparse VB approaches in a numerical study.


Landscape of R packages for eXplainable Artificial Intelligence

arXiv.org Machine Learning

The growing availability of data and computing power fuels the development of predictive models. In order to ensure the safe and effective functioning of such models, we need methods for exploration, debugging, and validation. New methods and tools for this purpose are being developed within the eXplainable Artificial Intelligence (XAI) subdomain of machine learning. In this work (1) we present the taxonomy of methods for model explanations, (2) we identify and compare 27 packages available in R to perform XAI analysis, (3) we present an example of an application of particular packages, (4) we acknowledge recent trends in XAI. The article is primarily devoted to the tools available in R, but since it is easy to integrate the Python code, we will also show examples for the most popular libraries from Python.


Multiple Descent: Design Your Own Generalization Curve

arXiv.org Machine Learning

This paper explores the generalization loss of linear regression in variably parameterized families of models, both under-parameterized and over-parameterized. We show that the generalization curve can have an arbitrary number of peaks, and moreover, locations of those peaks can be explicitly controlled. Our results highlight the fact that both classical U-shaped generalization curve and the recently observed double descent curve are not intrinsic properties of the model family. Instead, their emergence is due to the interaction between the properties of the data and the inductive biases of learning algorithms.


Coresets via Bilevel Optimization for Continual Learning and Streaming

arXiv.org Machine Learning

Coresets are small data summaries that are sufficient for model training. They can be maintained online, enabling efficient handling of large data streams under resource constraints. However, existing constructions are limited to simple models such as k-means and logistic regression. In this work, we propose a novel coreset construction via cardinality-constrained bilevel optimization. We show how our framework can efficiently generate coresets for deep neural networks, and demonstrate its empirical benefits in continual learning and in streaming settings.


TensorFlow - Hands-on Machine Learning with TensorFlow

#artificialintelligence

Created by The Click Reader Preview this Udemy Course - GET COUPON CODE Learn how to build Machine Learning projects in this TensorFlow Course created by The Click Reader. In this course, you will be learning about Scalar as well as Tensors and how to create them using TensorFlow. You will also be learning how to perform various kinds of Tensor operations for manipulating and changing tensor values. You will be performing a total of three Machine Learning projects while learning through this TensorFlow full course: 1. Linear Regression from Scratch You will be learning how to create a Linear Regression model from scratch using TensorFlow. You will be preparing the data, building the model architecture as well as training the model using a custom-made loss function as well as an optimizer.