Goto

Collaborating Authors

 Regression


Leveraged volume sampling for linear regression

Neural Information Processing Systems

Suppose an n x d design matrix in a linear regression problem is given, but the response for each point is hidden unless explicitly requested. The goal is to sample only a small number k n of the responses, and then produce a weight vector whose sum of squares loss over all points is at most 1 epsilon times the minimum. When k is very small (e.g., k d), jointly sampling diverse subsets of points is crucial. One such method called "volume sampling" has a unique and desirable property that the weight vector it produces is an unbiased estimate of the optimum. It is therefore natural to ask if this method offers the optimal unbiased estimate in terms of the number of responses k needed to achieve a 1 epsilon loss approximation.


Scalable Hyperparameter Transfer Learning

Neural Information Processing Systems

Bayesian optimization (BO) is a model-based approach for gradient-free black-box function optimization, such as hyperparameter optimization. Typically, BO relies on conventional Gaussian process (GP) regression, whose algorithmic complexity is cubic in the number of evaluations. As a result, GP-based BO cannot leverage large numbers of past function evaluations, for example, to warm-start related BO runs. We propose a multi-task adaptive Bayesian linear regression model for transfer learning in BO, whose complexity is linear in the function evaluations: one Bayesian linear regression model is associated to each black-box function optimization problem (or task), while transfer learning is achieved by coupling the models through a shared deep neural net. Experiments show that the neural net learns a representation suitable for warm-starting the black-box optimization problems and that BO runs can be accelerated when the target black-box function (e.g., validation loss) is learned together with other related signals (e.g., training loss).


Reviews: Optimization over Continuous and Multi-dimensional Decisions with Observational Data

Neural Information Processing Systems

The paper proposes an algorithm that can learn good decision-making policies over a continuous set of decisions using only access to observational data. The problem is well-motivated, but the paper can be substantially strengthened. Quality: Ok The paper clearly motivates a weakness of direct regression (e.g. from context and decision, to predict expected cost). The regression models may have different uncertainty for different decisions, and so it is useful to include an empirical estimate of variance and bias of the regression when selecting decisions. The paper will be more informative by highlighting several choices of regression models, each with different V (variance) and B (bias), and observing how lambda_1 lambda_2 are tuned to pick low-cost decisions with high probability.


Reviews: Linear regression without correspondence

Neural Information Processing Systems

The article "Linear regression without correspondence" considers the problem of estimation in linear regression model in specific situation where the correspondence between the covariates and the responses is unknown. The authors propose the fully polynomial algorithms for the solution of least squares problem and also study the statistical lower bounds. The main emphasis of the article is on the construction of fully polynomial algorithms for least squares problem in noisy and noiseless case, while previously only the algorithms with exponential complexity were known for the cases with dimension d 1. For the noisy case the authors propose the algorithm which gives a solution of least squares problem with any prespecified accuracy. For noiseless case another algorithm is proposed, which gives the exact solution of the least squares problem.


Reviews: Regularized Modal Regression with Applications in Cognitive Impairment Prediction

Neural Information Processing Systems

The authors present a regularized modal regression method. The statistical learning view of this proposed method is studied and the resulting model is applied to Alzheimer's disease studies. There are several presentation and evaluation issues for this work in its current form. Firstly, the authors motivate the paper using Alzheimer's disease studies and argue that modal regression is the way to analyze correlations between several disease markers of the disease. The necessity of use conditional mode for regression has nothing specific for the Alzheimer's application. The motivation for RMR makes sense without any AD related context.


Reviews: Using Large Ensembles of Control Variates for Variational Inference

Neural Information Processing Systems

Thank you for the thoughtful response. I have read the other reviews and the rebuttal, and after discussing the work I am electing to keep my score the same. I am somewhat unsatisfied by the author response; for papers where gradient estimator efficiency (in terms of variance) is in service of the optimization problem, comparing ELBO traces by iteration can be very misleading. If the machinery you introduce to efficiently use an ensemble of control variates is not very costly, then it should be measured or shown in your experiments. My comments below weren't about optimal tuning, they were more about exploring/understanding the sensitivity of their method on the parameters they introduce.


Reviews: Efficient inference for time-varying behavior during learning

Neural Information Processing Systems

The paper presents a new estimator for a dynamic logistic regression model used to characterize human or animal behavior in the context of binary decisions tasks. Despite the very large number of parameters such a model can be estimated robustly and with tolerable computational cost by exploiting the structure of the posterior covariance. The estimator is then applied to a delayed auditory discrimination task in humans and animals. The results show signatures of learning and history interference in the rodent compared to human. Additionally the model fit manages to predict behavior in test data.


Reviews: Sample Efficient Stochastic Gradient Iterative Hard Thresholding Method for Stochastic Sparse Linear Regression with Limited Attribute Observation

Neural Information Processing Systems

This looks an interesting result. Authors suggest a couple of methods, inspired by "mini batch strategy", for solving linear regression when the signal of interest is sparse and one can observe the data partially. The problem and their proposed methods are clearly described in the paper. Theoretical guarantees for convergence of the algorithms are provided which is supported by practical evidence given in the paper. As the authors suggest, these methods look to outperform other existing methods in term of "sample complexity".


FAIREDU: A Multiple Regression-Based Method for Enhancing Fairness in Machine Learning Models for Educational Applications

arXiv.org Artificial Intelligence

Fairness in artificial intelligence and machine learning (AI/ML) models is becoming critically important, especially as decisions made by these systems impact diverse groups. In education, a vital sector for all countries, the widespread application of AI/ML systems raises specific concerns regarding fairness. Current research predominantly focuses on fairness for individual sensitive features, which limits the comprehensiveness of fairness assessments. This paper introduces FAIREDU, a novel and effective method designed to improve fairness across multiple sensitive features. Through extensive experiments, we evaluate FAIREDU effectiveness in enhancing fairness without compromising model performance. The results demonstrate that FAIREDU addresses intersectionality across features such as gender, race, age, and other sensitive features, outperforming state-of-the-art methods with minimal effect on model accuracy. The paper also explores potential future research directions to enhance further the method robustness and applicability to various machine-learning models and datasets.


A convex formulation of covariate-adjusted Gaussian graphical models via natural parametrization

arXiv.org Machine Learning

Gaussian graphical models (GGMs) are widely used for recovering the conditional independence structure among random variables. Recently, several key advances have been made to exploit an additional set of variables for better estimating the GGMs of the variables of interest. For example, in co-expression quantitative trait locus (eQTL) studies, both the mean expression level of genes as well as their pairwise conditional independence structure may be adjusted by genetic variants local to those genes. Existing methods to estimate covariate-adjusted GGMs either allow only the mean to depend on covariates or suffer from poor scaling assumptions due to the inherent non-convexity of simultaneously estimating the mean and precision matrix. In this paper, we propose a convex formulation that jointly estimates the covariate-adjusted mean and precision matrix by utilizing the natural parametrization of the multivariate Gaussian likelihood. This convexity yields theoretically better performance as the sparsity and dimension of the covariates grow large relative to the number of samples. We verify our theoretical results with numerical simulations and perform a reanalysis of an eQTL study of glioblastoma multiforme (GBM), an aggressive form of brain cancer.