AITopics | Regression

Collaborating Authors

Regression

News Overviews Instructional Materials AI-Alerts Classics

Elementary Symmetric Polynomials for Optimal Experimental Design

Neural Information Processing SystemsOct-2-2024, 21:39:26 GMT

We revisit the classical problem of optimal experimental design (OED) under a new mathematical model grounded in a geometric motivation. Specifically, we introduce models based on elementary symmetric polynomials; these polynomials capture "partial volumes" and offer a graded interpolation between the widely used A-optimal design and D-optimal design models, obtaining each of them as special cases. We analyze properties of our models, and derive both greedy and convex-relaxation algorithms for computing the associated designs. Our analysis establishes approximation guarantees on these algorithms, while our empirical results substantiate our claims and demonstrate a curious phenomenon concerning our greedy method. Finally, as a byproduct, we obtain new results on the theory of elementary symmetric polynomials that may be of independent interest.

algorithm, d-optimal design, experiment, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Energy (0.31)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)
Information Technology > Modeling & Simulation (0.66)

Add feedback

Generalized Linear Model Regression under Distance-to-set Penalties

Jason Xu, Eric Chi, Kenneth Lange

Neural Information Processing SystemsOct-2-2024, 15:57:49 GMT

Estimation in generalized linear models (GLM) is complicated by the presence of constraints. One can handle constraints by maximizing a penalized log-likelihood. Penalties such as the lasso are effective in high dimensions, but often lead to unwanted shrinkage. This paper explores instead penalizing the squared distance to constraint sets. Distance penalties are more flexible than algebraic and regularization penalties, and avoid the drawback of shrinkage. To optimize distance penalized objectives, we make use of the majorization-minimization principle. Resulting algorithms constructed within this framework are amenable to acceleration and come with global convergence guarantees. Applications to shape constraints, sparse regression, and rank-restricted matrix regression on synthetic and real data showcase strong empirical performance, even under non-convex constraints.

algorithm, constraint, regression, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > North Carolina (0.04)
North America > United States > New York (0.04)
(3 more...)

Genre: Research Report > New Finding (0.69)

Industry: Health & Medicine (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.70)

Add feedback

Nonuniform random feature models using derivative information

Pieper, Konstantin, Zhang, Zezhong, Zhang, Guannan

arXiv.org Artificial IntelligenceOct-2-2024

We propose nonuniform data-driven parameter distributions for neural network initialization based on derivative data of the function to be approximated. These parameter distributions are developed in the context of non-parametric regression models based on shallow neural networks, and compare favorably to well-established uniform random feature models based on conventional weight initialization. We address the cases of Heaviside and ReLU activation functions, and their smooth approximations (sigmoid and softplus), and use recent results on the harmonic analysis and sparse representation of neural networks resulting from fully trained optimal networks. Extending analytic results that give exact representation, we obtain densities that concentrate in regions of the parameter space corresponding to neurons that are well suited to model the local derivatives of the unknown function. Based on these results, we suggest simplifications of these exact densities based on approximate derivative data in the input points that allow for very efficient sampling and lead to performance of random feature models close to optimal networks in several scenarios.

activation function, neural network, random feature model, (15 more...)

arXiv.org Artificial Intelligence

2410.02132

Country:

North America > United States (1.00)
North America > Puerto Rico > San Juan > San Juan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.63)

Industry:

Government > Regional Government > North America Government > United States Government (0.93)
Energy (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.66)

Add feedback

Knowledge-Driven Feature Selection and Engineering for Genotype Data with Large Language Models

Lee, Joseph, Yang, Shu, Baik, Jae Young, Liu, Xiaoxi, Tan, Zhen, Li, Dawei, Wen, Zixuan, Hou, Bojian, Duong-Tran, Duy, Chen, Tianlong, Shen, Li

arXiv.org Artificial IntelligenceOct-2-2024

Predicting phenotypes with complex genetic bases based on a small, interpretable set of variant features remains a challenging task. Conventionally, data-driven approaches are utilized for this task, yet the high dimensional nature of genotype data makes the analysis and prediction difficult. Motivated by the extensive knowledge encoded in pre-trained LLMs and their success in processing complex biomedical concepts, we set to examine the ability of LLMs in feature selection and engineering for tabular genotype data, with a novel knowledge-driven framework. We develop FREEFORM, Free-flow Reasoning and Ensembling for Enhanced Feature Output and Robust Modeling, designed with chain-of-thought and ensembling principles, to select and engineer features with the intrinsic knowledge of LLMs. Evaluated on two distinct genotype-phenotype datasets, genetic ancestry and hereditary hearing loss, we find this framework outperforms several data-driven methods, particularly on low-shot regimes. FREEFORM is available as open-source framework at GitHub: https://github.com/PennShenLab/FREEFORM.

arxiv preprint arxiv, llm, variant, (11 more...)

arXiv.org Artificial Intelligence

2410.01795

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > North Carolina (0.04)
North America > United States > Maryland > Anne Arundel County > Annapolis (0.04)
(4 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)

Add feedback

Causal Inference Tools for a Better Evaluation of Machine Learning

Soumm, Michaël

arXiv.org Artificial IntelligenceOct-2-2024

We present a comprehensive framework for applying rigorous statistical techniques from econometrics to analyze and improve machine learning systems. We introduce key statistical methods such as Ordinary Least Squares (OLS) regression, Analysis of Variance (ANOVA), and logistic regression, explaining their theoretical foundations and practical applications in machine learning evaluation. The document serves as a guide for researchers and practitioners, detailing how these techniques can provide deeper insights into model behavior, performance, and fairness. We cover the mathematical principles behind each method, discuss their assumptions and limitations, and provide step-by-step instructions for their implementation. The paper also addresses how to interpret results, emphasizing the importance of statistical significance and effect size. Through illustrative examples, we demonstrate how these tools can reveal subtle patterns and interactions in machine learning models that are not apparent from traditional evaluation metrics. By connecting the fields of econometrics and machine learning, this work aims to equip readers with powerful analytical tools for more rigorous and comprehensive evaluation of AI systems. The framework presented here contributes to developing more robust, interpretable, and fair machine learning technologies.

assumption, regression, variance, (17 more...)

arXiv.org Artificial Intelligence

2410.01392

Country:

North America > United States (0.04)
Europe > Slovakia > Bratislava > Bratislava (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

Efficient Statistics With Unknown Truncation, Polynomial Time Algorithms, Beyond Gaussians

Lee, Jane H., Mehrotra, Anay, Zampetakis, Manolis

arXiv.org Machine LearningOct-2-2024

We study the estimation of distributional parameters when samples are shown only if they fall in some unknown set $S \subseteq \mathbb{R}^d$. Kontonis, Tzamos, and Zampetakis (FOCS'19) gave a $d^{\mathrm{poly}(1/\varepsilon)}$ time algorithm for finding $\varepsilon$-accurate parameters for the special case of Gaussian distributions with diagonal covariance matrix. Recently, Diakonikolas, Kane, Pittas, and Zarifis (COLT'24) showed that this exponential dependence on $1/\varepsilon$ is necessary even when $S$ belongs to some well-behaved classes. These works leave the following open problems which we address in this work: Can we estimate the parameters of any Gaussian or even extend beyond Gaussians? Can we design $\mathrm{poly}(d/\varepsilon)$ time algorithms when $S$ is a simple set such as a halfspace? We make progress on both of these questions by providing the following results: 1. Toward the first question, we give a $d^{\mathrm{poly}(\ell/\varepsilon)}$ time algorithm for any exponential family that satisfies some structural assumptions and any unknown set $S$ that is $\varepsilon$-approximable by degree-$\ell$ polynomials. This result has two important applications: 1a) The first algorithm for estimating arbitrary Gaussian distributions from samples truncated to an unknown $S$; and 1b) The first algorithm for linear regression with unknown truncation and Gaussian features. 2. To address the second question, we provide an algorithm with runtime $\mathrm{poly}(d/\varepsilon)$ that works for a set of exponential families (containing all Gaussians) when $S$ is a halfspace or an axis-aligned rectangle. Along the way, we develop tools that may be of independent interest, including, a reduction from PAC learning with positive and unlabeled samples to PAC learning with positive and negative samples that is robust to certain covariate shifts.

algorithm, assumption 3, poly, (13 more...)

arXiv.org Machine Learning

2410.01656

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New York > New York County > New York City (0.04)
Asia > Middle East > Jordan (0.04)
(6 more...)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.66)

Add feedback

Attention layers provably solve single-location regression

Marion, Pierre, Berthier, Raphaël, Biau, Gérard, Boyer, Claire

arXiv.org Machine LearningOct-2-2024

Attention-based models, such as Transformer, excel across various tasks but lack a comprehensive theoretical understanding, especially regarding token-wise sparsity and internal linear representations. To address this gap, we introduce the single-location regression task, where only one token in a sequence determines the output, and its position is a latent random variable, retrievable via a linear projection of the input. To solve this task, we propose a dedicated predictor, which turns out to be a simplified version of a non-linear self-attention layer. We study its theoretical properties, by showing its asymptotic Bayes optimality and analyzing its training dynamics. In particular, despite the non-convex nature of the problem, the predictor effectively learns the underlying structure. This work highlights the capacity of attention mechanisms to handle sparse token information and internal linear structures.

erf, predictor, sequence, (13 more...)

arXiv.org Machine Learning

2410.01537

Country:

North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)
Information Technology > Data Science (0.67)

Add feedback

Transformers Handle Endogeneity in In-Context Linear Regression

Liang, Haodong, Balasubramanian, Krishnakumar, Lai, Lifeng

arXiv.org Machine LearningOct-2-2024

We explore the capability of transformers to address endogeneity in in-context linear regression. Our main finding is that transformers inherently possess a mechanism to handle endogeneity effectively using instrumental variables (IV). First, we demonstrate that the transformer architecture can emulate a gradient-based bi-level optimization procedure that converges to the widely used two-stage least squares $(\textsf{2SLS})$ solution at an exponential rate. Next, we propose an in-context pretraining scheme and provide theoretical guarantees showing that the global minimizer of the pre-training loss achieves a small excess loss. Our extensive experiments validate these theoretical findings, showing that the trained transformer provides more robust and reliable in-context predictions and coefficient estimates than the $\textsf{2SLS}$ method, in the presence of endogeneity.

estimator, transformer, transformer model, (13 more...)

arXiv.org Machine Learning

2410.01265

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.84)

Add feedback

Revisiting Optimism and Model Complexity in the Wake of Overparameterized Machine Learning

Patil, Pratik, Du, Jin-Hong, Tibshirani, Ryan J.

arXiv.org Machine LearningOct-2-2024

Common practice in modern machine learning involves fitting a large number of parameters relative to the number of observations. These overparameterized models can exhibit surprising generalization behavior, e.g., ``double descent'' in the prediction error curve when plotted against the raw number of model parameters, or another simplistic notion of complexity. In this paper, we revisit model complexity from first principles, by first reinterpreting and then extending the classical statistical concept of (effective) degrees of freedom. Whereas the classical definition is connected to fixed-X prediction error (in which prediction error is defined by averaging over the same, nonrandom covariate points as those used during training), our extension of degrees of freedom is connected to random-X prediction error (in which prediction error is averaged over a new, random sample from the covariate distribution). The random-X setting more naturally embodies modern machine learning problems, where highly complex models, even those complex enough to interpolate the training data, can still lead to desirable generalization performance under appropriate conditions. We demonstrate the utility of our proposed complexity measures through a mix of conceptual arguments, theory, and experiments, and illustrate how they can be used to interpret and compare arbitrary prediction models.

freedom, random-x degree, regression, (17 more...)

arXiv.org Machine Learning

2410.01259

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)

Genre:

Research Report (0.50)
Overview (0.45)

Industry: Education (0.34)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)

Add feedback

Modeling and Prediction of the UEFA EURO 2024 via Combined Statistical Learning Approaches

Groll, Andreas, Hvattum, Lars M., Ley, Christophe, Sternemann, Jonas, Schauberger, Gunther, Zeileis, Achim

arXiv.org Artificial IntelligenceOct-1-2024

In this work, three fundamentally different machine learning models are combined to create a new, joint model for forecasting the UEFA EURO 2024. Therefore, a generalized linear model, a random forest model, and a extreme gradient boosting model are used to predict the number of goals a team scores in a match. The three models are trained on the match results of the UEFA EUROs 2004-2020, with additional covariates characterizing the teams for each tournament as well as three enhanced variables derived from different ranking methods for football teams. The first enhanced variable is based on historic match data from national teams, the second is based on the bookmakers' tournament winning odds of all participating teams, and the third is based on historic match data of individual players both for club and international matches, resulting in player ratings. Then, based on current covariate information of the participating teams, the final trained model is used to predict the UEFA EURO 2024. For this purpose, the tournament is simulated 100.000 times, based on the estimated expected number of goals for all possible matches, from which probabilities across the different tournament stages are derived. Our combined model identifies France as the clear favourite with a winning probability of 19.2%, followed by England (16.7%) and host Germany (13.7%).

artificial intelligence, machine learning, prediction, (17 more...)

arXiv.org Artificial Intelligence

2410.09068

Country:

Europe > France (0.25)
Europe > United Kingdom > England (0.25)
North America > United States > California > San Francisco County > San Francisco (0.14)
(24 more...)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Sports > Soccer (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback