Goto

Collaborating Authors

 Regression


Information bottleneck theory of high-dimensional regression: relevancy, efficiency and optimality

arXiv.org Artificial Intelligence

Avoiding overfitting is a central challenge in machine learning, yet many large neural networks readily achieve zero training loss. This puzzling contradiction necessitates new approaches to the study of overfitting. Here we quantify overfitting via residual information, defined as the bits in fitted models that encode noise in training data. Information efficient learning algorithms minimize residual information while maximizing the relevant bits, which are predictive of the unknown generative models. We solve this optimization to obtain the information content of optimal algorithms for a linear regression problem and compare it to that of randomized ridge regression. Our results demonstrate the fundamental trade-off between residual and relevant information and characterize the relative information efficiency of randomized regression with respect to optimal algorithms. Finally, using results from random matrix theory, we reveal the information complexity of learning a linear map in high dimensions and unveil information-theoretic analogs of double and multiple descent phenomena.


FasterRisk: Fast and Accurate Interpretable Risk Scores

arXiv.org Artificial Intelligence

Over the last century, risk scores have been the most popular form of predictive model used in healthcare and criminal justice. Risk scores are sparse linear models with integer coefficients; often these models can be memorized or placed on an index card. Typically, risk scores have been created either without data or by rounding logistic regression coefficients, but these methods do not reliably produce high-quality risk scores. Recent work used mathematical programming, which is computationally slow. We introduce an approach for efficiently producing a collection of high-quality risk scores learned from data. Specifically, our approach produces a pool of almost-optimal sparse continuous solutions, each with a different support set, using a beam-search algorithm. Each of these continuous solutions is transformed into a separate risk score through a "star ray" search, where a range of multipliers are considered before rounding the coefficients sequentially to maintain low logistic loss. Our algorithm returns all of these high-quality risk scores for the user to consider. This method completes within minutes and can be valuable in a broad variety of applications.


[100%OFF] Logistic Regression In Python

#artificialintelligence

You're looking for a complete Classification modeling course that teaches you everything you need to create a Classification model in Python, right? You've found the right Classification modeling course! How this course will help you? A Verifiable Certificate of Completion is presented to all students who undertake this Machine learning basics course. Why should you choose this course?


[100%OFF] Complete Linear Regression Analysis In Python

#artificialintelligence

You're looking for a complete Linear Regression course that teaches you everything you need to create a Linear Regression model in Python, right? You've found the right Linear Regression course! A Verifiable Certificate of Completion is presented to all students who undertake this Machine learning basics course. How this course will help you? Why should you choose this course?


Bayesian adaptive and interpretable functional regression for exposure profiles

arXiv.org Machine Learning

Pollutant exposure during gestation is a known and adverse factor for birth and health outcomes. However, the links between prenatal air pollution exposures and educational outcomes are less clear, in particular the critical windows of susceptibility during pregnancy. Using a large cohort of students in North Carolina, we study the link between prenatal daily $\mbox{PM}_{2.5}$ exposure and 4th end-of-grade reading scores. We develop and apply a locally adaptive and highly scalable Bayesian regression model for scalar responses with functional and scalar predictors. The proposed model pairs a B-spline basis expansion with dynamic shrinkage priors to capture both smooth and rapidly-changing features in the regression surface. The model is accompanied by a new decision analysis approach for functional regression that extracts the critical windows of susceptibility and guides the model interpretations. These tools help to identify and address broad limitations with the interpretability of functional regression models. Simulation studies demonstrate more accurate point estimation, more precise uncertainty quantification, and far superior window selection than existing approaches. Leveraging the proposed modeling, computational, and decision analysis framework, we conclude that prenatal $\mbox{PM}_{2.5}$ exposure during early and late pregnancy is most adverse for 4th end-of-grade reading scores.


Bayesian Sparse Regression for Mixed Multi-Responses with Application to Runtime Metrics Prediction in Fog Manufacturing

arXiv.org Machine Learning

Fog manufacturing can greatly enhance traditional manufacturing systems through distributed Fog computation units, which are governed by predictive computational workload offloading methods under different Industrial Internet architectures. It is known that the predictive offloading methods highly depend on accurate prediction and uncertainty quantification of runtime performance metrics, containing multivariate mixed-type responses (i.e., continuous, counting, binary). In this work, we propose a Bayesian sparse regression for multivariate mixed responses to enhance the prediction of runtime performance metrics and to enable the statistical inferences. The proposed method considers both group and individual variable selection to jointly model the mixed types of runtime performance metrics. The conditional dependency among multiple responses is described by a graphical model using the precision matrix, where a spike-and-slab prior is used to enable the sparse estimation of the graph. The proposed method not only achieves accurate prediction, but also makes the predictive model more interpretable with statistical inferences on model parameters and prediction in the Fog manufacturing. A simulation study and a real case example in a Fog manufacturing are conducted to demonstrate the merits of the proposed model.


Scalable Gaussian-process regression and variable selection using Vecchia approximations

arXiv.org Machine Learning

Gaussian process (GP) regression is a flexible, nonparametric approach to regression that naturally quantifies uncertainty. In many applications, the number of responses and covariates are both large, and a goal is to select covariates that are related to the response. For this setting, we propose a novel, scalable algorithm, coined VGPR, which optimizes a penalized GP log-likelihood based on the Vecchia GP approximation, an ordered conditional approximation from spatial statistics that implies a sparse Cholesky factor of the precision matrix. We traverse the regularization path from strong to weak penalization, sequentially adding candidate covariates based on the gradient of the log-likelihood and deselecting irrelevant covariates via a new quadratic constrained coordinate descent algorithm. We propose Vecchia-based mini-batch subsampling, which provides unbiased gradient estimators. The resulting procedure is scalable to millions of responses and thousands of covariates. Theoretical analysis and numerical studies demonstrate the improved scalability and accuracy relative to existing methods.


Second-order regression models exhibit progressive sharpening to the edge of stability

arXiv.org Artificial Intelligence

Recent studies of gradient descent with large step sizes have shown that there is often a regime with an initial increase in the largest eigenvalue of the loss Hessian (progressive sharpening), followed by a stabilization of the eigenvalue near the maximum value which allows convergence (edge of stability). These phenomena are intrinsically non-linear and do not happen for models in the constant Neural Tangent Kernel (NTK) regime, for which the predictive function is approximately linear in the parameters. As such, we consider the next simplest class of predictive models, namely those that are quadratic in the parameters, which we call second-order regression models. For quadratic objectives in two dimensions, we prove that this second-order regression model exhibits progressive sharpening of the NTK eigenvalue towards a value that differs slightly from the edge of stability, which we explicitly compute. In higher dimensions, the model generically shows similar behavior, even without the specific structure of a neural network, suggesting that progressive sharpening and edge-of-stability behavior aren't unique features of neural networks, and could be a more general property of discrete learning algorithms in high-dimensional non-linear models.


Nonlinear Sufficient Dimension Reduction with a Stochastic Neural Network

arXiv.org Artificial Intelligence

Sufficient dimension reduction is a powerful tool to extract core information hidden in the high-dimensional data and has potentially many important applications in machine learning tasks. However, the existing nonlinear sufficient dimension reduction methods often lack the scalability necessary for dealing with large-scale data. We propose a new type of stochastic neural network under a rigorous probabilistic framework and show that it can be used for sufficient dimension reduction for large-scale data. The proposed stochastic neural network is trained using an adaptive stochastic gradient Markov chain Monte Carlo algorithm, whose convergence is rigorously studied in the paper as well. Through extensive experiments on real-world classification and regression problems, we show that the proposed method compares favorably with the existing state-of-the-art sufficient dimension reduction methods and is computationally more efficient for large-scale data.


Coresets for Relational Data and The Applications

arXiv.org Artificial Intelligence

A coreset is a small set that can approximately preserve the structure of the original input data set. Therefore we can run our algorithm on a coreset so as to reduce the total computational complexity. Conventional coreset techniques assume that the input data set is available to process explicitly. However, this assumption may not hold in real-world scenarios. In this paper, we consider the problem of coresets construction over relational data. Namely, the data is decoupled into several relational tables, and it could be very expensive to directly materialize the data matrix by joining the tables. We propose a novel approach called ``aggregation tree with pseudo-cube'' that can build a coreset from bottom to up. Moreover, our approach can neatly circumvent several troublesome issues of relational learning problems [Khamis et al., PODS 2019]. Under some mild assumptions, we show that our coreset approach can be applied for the machine learning tasks, such as clustering, logistic regression and SVM.