AITopics | Regression

Collaborating Authors

Regression

News Overviews Instructional Materials AI-Alerts Classics

Leveraged volume sampling for linear regression

Derezinski, Michal, Warmuth, Manfred K. K., Hsu, Daniel J.

Neural Information Processing SystemsFeb-14-2020, 10:56:38 GMT

Suppose an n x d design matrix in a linear regression problem is given, but the response for each point is hidden unless explicitly requested. The goal is to sample only a small number k n of the responses, and then produce a weight vector whose sum of squares loss over *all* points is at most 1 epsilon times the minimum. When k is very small (e.g., k d), jointly sampling diverse subsets of points is crucial. One such method called "volume sampling" has a unique and desirable property that the weight vector it produces is an unbiased estimate of the optimum. It is therefore natural to ask if this method offers the optimal unbiased estimate in terms of the number of responses k needed to achieve a 1 epsilon loss approximation.

leverage score, linear regression, unbiased estimate, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.63)

Add feedback

Multi-way Interacting Regression via Factorization Machines

Yurochkin, Mikhail, Nguyen, XuanLong, Vasiloglou, nikolaos

Neural Information Processing SystemsFeb-14-2020, 10:56:35 GMT

We propose a Bayesian regression method that accounts for multi-way interactions of arbitrary orders among the predictor variables. Our model makes use of a factorization mechanism for representing the regression coefficients of interactions among the predictors, while the interaction selection is guided by a prior distribution on random hypergraphs, a construction which generalizes the Finite Feature Model. We present a posterior inference algorithm based on Gibbs sampling, and establish posterior consistency of our regression model. Our method is evaluated with extensive experiments on simulated data and demonstrated to be able to identify meaningful interactions in applications in genetics and retail demand forecasting. Papers published at the Neural Information Processing Systems Conference.

factorization machine, interaction, multi-way interacting regression

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.98)

Add feedback

Mixed Linear Regression with Multiple Components

Zhong, Kai, Jain, Prateek, Dhillon, Inderjit S.

Neural Information Processing SystemsFeb-14-2020, 10:42:01 GMT

In this paper, we study the mixed linear regression (MLR) problem, where the goal is to recover multiple underlying linear models from their unlabeled linear measurements. We propose a non-convex objective function which we show is {\em locally strongly convex} in the neighborhood of the ground truth. We use a tensor method for initialization so that the initial models are in the local strong convexity region. We then employ general convex optimization algorithms to minimize the objective function. To the best of our knowledge, our approach provides first exact recovery guarantees for the MLR problem with $K \geq 2$ components. Moreover, our method has near-optimal computational complexity $\tilde O (Nd)$ as well as near-optimal sample complexity $\tilde O (d)$ for {\em constant} $K$.

linear regression, multiple component, subspace, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.64)

Add feedback

Blind Regression: Nonparametric Regression for Latent Variable Models via Collaborative Filtering

Song, Dogyoon, Lee, Christina E., Li, Yihua, Shah, Devavrat

Neural Information Processing SystemsFeb-14-2020, 10:40:52 GMT

We introduce the framework of {\em blind regression} motivated by {\em matrix completion} for recommendation systems: given $m$ users, $n$ movies, and a subset of user-movie ratings, the goal is to predict the unobserved user-movie ratings given the data, i.e., to complete the partially observed matrix. Following the framework of non-parametric statistics, we posit that user $u$ and movie $i$ have features $x_1(u)$ and $x_2(i)$ respectively, and their corresponding rating $y(u,i)$ is a noisy measurement of $f(x_1(u), x_2(i))$ for some unknown function $f$. In contrast with classical regression, the features $x (x_1(u), x_2(i))$ are not observed, making it challenging to apply standard regression methods to predict the unobserved ratings. Inspired by the classical Taylor's expansion for differentiable functions, we provide a prediction algorithm that is consistent for all Lipschitz functions. In fact, the analysis through our framework naturally leads to a variant of collaborative filtering, shedding insight into the widespread success of collaborative filtering in practice.

collaborative filtering, nonparametric regression, regression, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.60)

Add feedback

A Bandit Framework for Strategic Regression

Liu, Yang, Chen, Yiling

Neural Information Processing SystemsFeb-14-2020, 09:56:52 GMT

We consider a learner's problem of acquiring data dynamically for training a regression model, where the training data are collected from strategic data sources. A fundamental challenge is to incentivize data holders to exert effort to improve the quality of their reported data, despite that the quality is not directly verifiable by the learner. In this work, we study a dynamic data acquisition process where data holders can contribute multiple times. We propose a Strategic Regression-Upper Confidence Bound (SR-UCB) framework, an UCB-style index combined with a simple payment rule, where the index of a worker approximates the quality of his past contributions and is used by the learner to determine whether the worker receives future work. For linear regression and certain family of non-linear regression problems, we show that SR-UCB enables a $O(\sqrt{\log T/T})$-Bayesian Nash Equilibrium (BNE) where each worker exerting a target effort level that the learner has chosen, with $T$ being the number of data acquisition stages.

bandit framework, learner, strategic regression, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

Distributionally Robust Logistic Regression

Abadeh, Soroosh Shafieezadeh, Esfahani, Peyman Mohajerin Mohajerin, Kuhn, Daniel

Neural Information Processing SystemsFeb-14-2020, 09:40:59 GMT

This paper proposes a distributionally robust approach to logistic regression. We use the Wasserstein distance to construct a ball in the space of probability distributions centered at the uniform distribution on the training samples. If the radius of this Wasserstein ball is chosen judiciously, we can guarantee that it contains the unknown data-generating distribution with high confidence. We then formulate a distributionally robust logistic regression model that minimizes a worst-case expected logloss function, where the worst case is taken over all distributions in the Wasserstein ball. We prove that this optimization problem admits a tractable reformulation and encapsulates the classical as well as the popular regularized logistic regression problems as special cases.

distributionally robust approach, distributionally robust logistic regression, wasserstein ball

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

High Dimensional Linear Regression using Lattice Basis Reduction

Zadik, Ilias, Gamarnik, David

Neural Information Processing SystemsFeb-14-2020, 08:57:29 GMT

We consider a high dimensional linear regression problem where the goal is to efficiently recover an unknown vector \beta * from n noisy linear observations Y X \beta * W in R n, for known X in R {n \times p} and unknown W in R n. Unlike most of the literature on this model we make no sparsity assumption on \beta *. Instead we adopt a regularization based on assuming that the underlying vectors \beta * have rational entries with the same denominator Q. We call this Q-rationality assumption. We propose a new polynomial-time algorithm for this task which is based on the seminal Lenstra-Lenstra-Lovasz (LLL) lattice basis reduction algorithm.

high dimensional linear regression, lattice basis reduction, q-rationality assumption, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.65)

Add feedback

Fast Classification Rates for High-dimensional Gaussian Generative Models

Li, Tianyang, Prasad, Adarsh, Ravikumar, Pradeep K.

Neural Information Processing SystemsFeb-14-2020, 08:25:54 GMT

We consider the problem of binary classification when the covariates conditioned on the each of the response values follow multivariate Gaussian distributions. We focus on the setting where the covariance matrices for the two conditional distributions are the same. The corresponding generative model classifier, derived via the Bayes rule, also called Linear Discriminant Analysis, has been shown to behave poorly in high-dimensional settings. We present a novel analysis of the classification error of any linear discriminant approach given conditional Gaussian models. This allows us to compare the generative model classifier, other recently proposed discriminative approaches that directly learn the discriminant function, and then finally logistic regression which is another classical discriminative model classifier.

classifier, fast classification rate, high-dimensional gaussian generative model, (4 more...)

Neural Information Processing Systems

Genre: Research Report (0.76)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.65)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.43)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.43)

Add feedback

Linear regression without correspondence

Hsu, Daniel J., Shi, Kevin, Sun, Xiaorui

Neural Information Processing SystemsFeb-14-2020, 08:14:16 GMT

This article considers algorithmic and statistical aspects of linear regression when the correspondence between the covariates and the responses is unknown. First, a fully polynomial-time approximation scheme is given for the natural least squares optimization problem in any constant dimension. Next, in an average-case and noise-free setting where the responses exactly correspond to a linear function of i.i.d. Finally, lower bounds on the signal-to-noise ratio are established for approximate recovery of the unknown linear function by any estimator. Papers published at the Neural Information Processing Systems Conference.

correspondence, linear function, linear regression, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)

Add feedback

Sparse Bayesian structure learning with "dependent relevance determination" priors

Wu, Anqi, Park, Mijung, Koyejo, Oluwasanmi O., Pillow, Jonathan W.

Neural Information Processing SystemsFeb-14-2020, 08:12:44 GMT

In many problem settings, parameter vectors are not merely sparse, but dependent in such a way that non-zero coefficients tend to cluster together. We refer to this form of dependency as "region sparsity". Classical sparse regression methods, such as the lasso and automatic relevance determination (ARD), model parameters as independent a priori, and therefore do not exploit such dependencies. Here we introduce a hierarchical model for smooth, region-sparse weight vectors and tensors in a linear regression setting. Our approach represents a hierarchical extension of the relevance determination framework, where we add a transformed Gaussian process to model the dependencies between the prior variances of regression weights.

dependent relevance determination, relevance determination, sparse bayesian structure, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.40)

Add feedback