AITopics

2409.10572

Country:

Asia > China > Beijing > Beijing (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Washington > King County > Redmond (0.04)

Genre: Research Report (1.00)

Industry: Materials (1.00)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Architecture > Real Time Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.82)

arXiv.org Artificial IntelligenceSep-15-2024

The Optimality of (Accelerated) SGD for High-Dimensional Quadratic Optimization

Zhang, Haihan, Liu, Yuanshi, Chen, Qianwen, Fang, Cong

Stochastic gradient descent (SGD) is a widely used algorithm in machine learning, particularly for neural network training. Recent studies on SGD for canonical quadratic optimization or linear regression show it attains well generalization under suitable high-dimensional settings. However, a fundamental question -- for what kinds of high-dimensional learning problems SGD and its accelerated variants can achieve optimality has yet to be well studied. This paper investigates SGD with two essential components in practice: exponentially decaying step size schedule and momentum. We establish the convergence upper bound for momentum accelerated SGD (ASGD) and propose concrete classes of learning problems under which SGD or ASGD achieves min-max optimal convergence rates. The characterization of the target function is based on standard power-law decays in (functional) linear regression. Our results unveil new insights for understanding the learning bias of SGD: (i) SGD is efficient in learning ``dense'' features where the corresponding weights are subject to an infinity norm constraint; (ii) SGD is efficient for easy problem without suffering from the saturation effect; (iii) momentum can accelerate the convergence rate by order when the learning problem is relatively hard. To our knowledge, this is the first work to clearly identify the optimal boundary of SGD versus ASGD for the problem under mild settings.

algorithm, nulla, sgd, (15 more...)

2409.09745

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Education > Focused Education > Special Education (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

Vafa, Keyon, Athey, Susan, Blei, David M.

Estimating Wage Disparities Using Foundation Models

arXiv.org Machine LearningSep-15-2024

One thread of empirical work in social science focuses on decomposing group differences in outcomes into unexplained components and components explained by observable factors. In this paper, we study gender wage decompositions, which require estimating the portion of the gender wage gap explained by career histories of workers. Classical methods for decomposing the wage gap employ simple predictive models of wages which condition on a small set of simple summaries of labor history. The problem is that these predictive models cannot take advantage of the full complexity of a worker's history, and the resulting decompositions thus suffer from omitted variable bias (OVB), where covariates that are correlated with both gender and wages are not included in the model. Here we explore an alternative methodology for wage gap decomposition that employs powerful foundation models, such as large language models, as the predictive engine. Foundation models excel at making accurate predictions from complex, high-dimensional inputs. We use a custom-built foundation model, designed to predict wages from full labor histories, to decompose the gender wage gap. We prove that the way such models are usually trained might still lead to OVB, but develop fine-tuning algorithms that empirically mitigate this issue. Our model captures a richer representation of career history than simple models and predicts wages more accurately. In detail, we first provide a novel set of conditions under which an estimator of the wage gap based on a fine-tuned foundation model is $\sqrt{n}$-consistent. Building on the theory, we then propose methods for fine-tuning foundation models that minimize OVB. Using data from the Panel Study of Income Dynamics, we find that history explains more of the gender wage gap than standard econometric models can measure, and we identify elements of history that are important for reducing OVB.

history, representation, wage gap, (15 more...)

2409.09894

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
North America > Mexico > Oaxaca (0.04)
North America > United States > Wisconsin (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.67)

Industry:

Education (0.67)
Health & Medicine (0.46)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Szentpéteri, Szabolcs, Csáji, Balázs Csanád

Finite Sample Analysis of Distribution-Free Confidence Ellipsoids for Linear Regression

arXiv.org Machine LearningSep-13-2024

The least squares (LS) estimate is the archetypical solution of linear regression problems. The asymptotic Gaussianity of the scaled LS error is often used to construct approximate confidence ellipsoids around the LS estimate, however, for finite samples these ellipsoids do not come with strict guarantees, unless some strong assumptions are made on the noise distributions. The paper studies the distribution-free Sign-Perturbed Sums (SPS) ellipsoidal outer approximation (EOA) algorithm which can construct non-asymptotically guaranteed confidence ellipsoids under mild assumptions, such as independent and symmetric noise terms. These ellipsoids have the same center and orientation as the classical asymptotic ellipsoids, only their radii are different, which radii can be computed by convex optimization. Here, we establish high probability non-asymptotic upper bounds for the sizes of SPS outer ellipsoids for linear regression problems and show that the volumes of these ellipsoids decrease at the optimal rate. Finally, the difference between our theoretical bounds and the empirical sizes of the regions are investigated experimentally.

assumption, confidence region, nr 1, (15 more...)

2409.08801

Country:

Europe > Hungary > Budapest > Budapest (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Hawaii (0.04)
(3 more...)

Genre:

Workflow (0.67)
Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.91)

Ramos-Pollan, Raul, Kalaitzis, Freddie, Selvam, Karthick Panner

Uncertainty and Generalizability in Foundation Models for Earth Observation

arXiv.org Artificial IntelligenceSep-13-2024

We take the perspective in which we want to design a downstream task (such as estimating vegetation coverage) on a certain area of interest (AOI) with a limited labeling budget. By leveraging an existing Foundation Model (FM) we must decide whether we train a downstream model on a different but label-rich AOI hoping it generalizes to our AOI, or we split labels in our AOI for training and validating. In either case, we face choices concerning what FM to use, how to sample our AOI for labeling, etc. which affect both the performance and uncertainty of the results. In this work, we perform a large ablative study using eight existing FMs on either Sentinel 1 or Sentinel 2 as input data, and the classes from the ESA World Cover product as downstream tasks across eleven AOIs. We do repeated sampling and training, resulting in an ablation of some 500K simple linear regression models. Our results show both the limits of spatial generalizability across AOIs and the power of FMs where we are able to get over 0.9 correlation coefficient between predictions and targets on different chip level predictive tasks. And still, performance and uncertainty vary greatly across AOIs, tasks and FMs. We believe this is a key issue in practice, because there are many design decisions behind each FM and downstream task (input modalities, sampling, architectures, pretraining, etc.) and usually a downstream task designer is aware of and can decide upon a few of them. Through this work, we advocate for the usage of the methodology herein described (large ablations on reference global labels and simple probes), both when publishing new FMs, and to make informed decisions when designing downstream tasks to use them.

aois, downstream task, target aois, (11 more...)

2409.08744

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
South America > Colombia (0.07)
South America > Peru (0.06)
(9 more...)

Genre: Research Report > New Finding (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.56)

Lyu, Chenyi, Liu, Xingchi, Mihaylova, Lyudmila

Review of Recent Advances in Gaussian Process Regression Methods

arXiv.org Machine LearningSep-12-2024

Gaussian process (GP) methods have been widely studied recently, especially for large-scale systems with big data and even more extreme cases when data is sparse. Key advantages of these methods consist in: 1) the ability to provide inherent ways to assess the impact of uncertainties (especially in the data, and environment) on the solutions, 2) have efficient factorisation based implementations and 3) can be implemented easily in distributed manners and hence provide scalable solutions. This paper reviews the recently developed key factorised GP methods such as the hierarchical off-diagonal low-rank approximation methods and GP with Kronecker structures. An example illustrates the performance of these methods with respect to accuracy and computational complexity.

approximation, covariance matrix, matrix, (15 more...)

2409.08112

Country:

North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > South Yorkshire > Sheffield (0.04)

Genre:

Research Report (1.00)
Overview (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.51)

arXiv.org Artificial IntelligenceSep-12-2024

Privacy-preserving federated prediction of pain intensity change based on multi-center survey data

Das, Supratim, Rafie, Mahdie, Kammer, Paula, Skou, Søren T., Grønne, Dorte T., Roos, Ewa M., Hajek, André, König, Hans-Helmut, Ullaha, Md Shihab, Probul, Niklas, Baumbacha, Jan, Baumbach, Linda

Background: Patient-reported survey data are used to train prognostic models aimed at improving healthcare. However, such data are typically available multi-centric and, for privacy reasons, cannot easily be centralized in one data repository. Models trained locally are less accurate, robust, and generalizable. We present and apply privacy-preserving federated machine learning techniques for prognostic model building, where local survey data never leaves the legally safe harbors of the medical centers. Methods: We used centralized, local, and federated learning techniques on two healthcare datasets (GLA:D data from the five health regions of Denmark and international SHARE data of 27 countries) to predict two different health outcomes. We compared linear regression, random forest regression, and random forest classification models trained on local data with those trained on the entire data in a centralized and in a federated fashion. Results: In GLA:D data, federated linear regression (R2 0.34, RMSE 18.2) and federated random forest regression (R2 0.34, RMSE 18.3) models outperform their local counterparts (i.e., R2 0.32, RMSE 18.6, R2 0.30, RMSE 18.8) with statistical significance. We also found that centralized models (R2 0.34, RMSE 18.2, R2 0.32, RMSE 18.5, respectively) did not perform significantly better than the federated models. In SHARE, the federated model (AC 0.78, AUROC: 0.71) and centralized model (AC 0.84, AUROC: 0.66) perform significantly better than the local models (AC: 0.74, AUROC: 0.69). Conclusion: Federated learning enables the training of prognostic models from multi-center surveys without compromising privacy and with only minimal or no compromise regarding model performance.

federated learning, federated model, local model, (14 more...)

2409.07997

Country:

Oceania > Australia (0.28)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > Denmark > Southern Denmark (0.05)
(29 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Therapeutic Area > Musculoskeletal (1.00)
(6 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.79)
Information Technology > Data Science > Data Mining > Big Data (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.56)

arXiv.org Artificial IntelligenceSep-12-2024

Explaining Datasets in Words: Statistical Models with Natural Language Parameters

Zhong, Ruiqi, Wang, Heng, Klein, Dan, Steinhardt, Jacob

To make sense of massive data, we often first fit simplified models and then interpret the parameters; for example, we cluster the text embeddings and then interpret the mean parameters of each cluster. However, these parameters are often highdimensional and hard to interpret. To make model parameters directly interpretable, we introduce a family of statistical models--including clustering, time series, and classification models--parameterized by natural language predicates. For example, a cluster of text about COVID could be parameterized by the predicate "discusses COVID". To learn these statistical models effectively, we develop a model-agnostic algorithm that optimizes continuous relaxations of predicate parameters with gradient descent and discretizes them by prompting language models (LMs). Finally, we apply our framework to a wide range of problems: taxonomizing user chat dialogues, characterizing how they evolve across time, finding categories where one language model is better than the other, clustering math problems based on subareas, and explaining visual features in memorable images. Our framework is highly versatile, applicable to both textual and visual domains, can be easily steered to focus on specific properties (e.g.

dataset, language model, predicate, (14 more...)

2409.08466

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
Asia > Middle East > Jordan (0.04)
South America > Colombia > Meta Department > Villavicencio (0.04)
(7 more...)

Genre: Research Report > Experimental Study (1.00)

Industry:

Information Technology (0.67)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)
Health & Medicine > Therapeutic Area > Immunology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)
(2 more...)

Ravfogel, Shauli, Twiton, Michael, Goldberg, Yoav, Cotterell, Ryan

Linear Adversarial Concept Erasure

arXiv.org Artificial IntelligenceSep-12-2024

Modern neural models trained on textual data rely on pre-trained representations that emerge without direct supervision. As these representations are increasingly being used in real-world applications, the inability to \emph{control} their content becomes an increasingly important problem. We formulate the problem of identifying and erasing a linear subspace that corresponds to a given concept, in order to prevent linear predictors from recovering the concept. We model this problem as a constrained, linear maximin game, and show that existing solutions are generally not optimal for this task. We derive a closed-form solution for certain objectives, and propose a convex relaxation, \method, that works well for others. When evaluated in the context of binary gender removal, the method recovers a low-dimensional subspace whose removal mitigates bias by intrinsic and extrinsic evaluation. We show that the method is highly expressive, effectively mitigating bias in deep nonlinear classifiers while maintaining tractability and interpretability.

computational linguistic, representation, subspace, (14 more...)

2201.12091

Country:

North America > United States > Ohio (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Wisconsin (0.04)
(3 more...)

Genre: Research Report > New Finding (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)
(2 more...)

Zhao, Huali, Wang, Tianying

Debiased high-dimensional regression calibration for errors-in-variables log-contrast models

arXiv.org Machine LearningSep-11-2024

Motivated by the challenges in analyzing gut microbiome and metagenomic data, this work aims to tackle the issue of measurement errors in high-dimensional regression models that involve compositional covariates. This paper marks a pioneering effort in conducting statistical inference on high-dimensional compositional data affected by mismeasured or contaminated data. We introduce a calibration approach tailored for the linear log-contrast model. Under relatively lenient conditions regarding the sparsity level of the parameter, we have established the asymptotic normality of the estimator for inference. Numerical experiments and an application in microbiome study have demonstrated the efficacy of our high-dimensional calibration strategy in minimizing bias and achieving the expected coverage rates for confidence intervals. Moreover, the potential application of our proposed methodology extends well beyond compositional data, suggesting its adaptability for a wide range of research contexts.

artificial intelligence, machine learning, measurement error, (17 more...)

2409.07568

Country: North America > United States > Colorado (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.93)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)