AITopics | asymptotic variance

Collaborating Authors

asymptotic variance

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Valid Inference with Imperfect Synthetic Data

Neural Information Processing SystemsJun-23-2026, 01:49:42 GMT

Predictions and generations from large language models are increasingly being explored as an aid in limited data regimes, such as in computational social science and human subjects research. While prior technical work has mainly explored the potential to use model-predicted labels for unlabeled data in a principled manner, there is increasing interest in using large language models to generate entirely new synthetic samples (e.g., synthetic simulations), such as in responses to surveys. However, it remains unclear by what means practitioners can combine such data with real data and yet produce statistically valid conclusions upon them. In this paper, we introduce a new estimator based on generalized method of moments, providing a hyperparameter-free solution with strong theoretical guarantees to address this challenge. Intriguingly, we find that interactions between the moment residuals of synthetic data and those of real data (i.e., when they are predictive of each other) can greatly improve estimates of the target parameter.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: North America > United States (0.92)

Genre: Research Report > Experimental Study (1.00)

Industry: Government (0.92)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Reductio 1k ff((ff (+λλλhhθθθθ1k1kXXYk12((((((H+i ii, estima Scientific study MUG'''3212'''223302222

Neural Information Processing SystemsJun-23-2026, 01:09:35 GMT

Randomized experiments are the preferred approach for evaluating the effects of interventions, but they are costly and often yield estimates with substantial uncertainty. On the other hand, in silico experiments leveraging foundation models offer a cost-effective alternative that can potentially attain higher statistical precision. However, the benefits of in silico experiments come with a significant risk: statistical inferences are not valid if the models fail to accurately predict experimental responses to interventions. In this paper, we propose a novel approach that integrates the predictions from multiple foundation models with experimental data while preserving valid statistical inference. Our estimator is consistent and asymptotically normal, with asymptotic variance no larger than the standard estimator based on experimental data alone. Importantly, these statistical properties hold even when model predictions are arbitrarily biased. Empirical results across several randomized experiments show that our estimator offers substantial precision gains, equivalent to a reduction of up to 20% in the sample size needed to match the same precision as the standard estimator based on experimental data alone.

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country: North America > United States > New Jersey (0.28)

Genre:

Research Report > Strength High (1.00)
Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Law (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Law Enforcement & Public Safety > Terrorism (0.94)
(3 more...)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)
(2 more...)

Add feedback

Optimal Adjustment Sets for Nonparametric Estimation of Weighted Controlled Direct Effect

Neural Information Processing SystemsJun-19-2026, 08:11:53 GMT

The weighted controlled direct effect (WCDE) generalizes the standard controlled direct effect (CDE) by averaging over the mediator distribution, providing a robust estimate when treatment effects vary across mediator levels. This makes the WCDE especially relevant in fairness analysis, where it isolates the direct effect of an exposure on an outcome, independent of mediating pathways. This work establishes three fundamental advances for WCDE in observational studies: First, we establish necessary and sufficient conditions for the identifiability of the WCDE, clarifying when it diverges from the CDE. Next, we consider nonparametric estimation of the WCDE and derive its influence function, focusing on the class of regular and asymptotically linear estimators. Lastly, we characterize the optimal covariate adjustment set that minimizes the asymptotic variance, demonstrating how mediator-confounder interactions introduce distinct requirements compared to average treatment effect (ATE) estimation. Using synthetic and real-world data, we validate our theory numerically, showing that the proposed optimal valid adjustment set yields the lowest variance at practical sample sizes. Our results offer a principled framework for efficient estimation of direct effects in complex causal systems, with practical applications in fairness and mediation analysis.

adjustment, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States (0.46)
Europe > United Kingdom (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.87)

Industry:

Health & Medicine (1.00)
Law (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Optimal Nuisance Function Tuning for Estimating a Doubly Robust Functional under Proportional Asymptotics

Neural Information Processing SystemsJun-19-2026, 03:47:19 GMT

In this paper, we explore the asymptotically optimal tuning parameter choice in ridge regression for estimating nuisance functions of a statistical functional that has recently gained prominence in conditional independence testing and causal inference. Given a sample of size n, we study estimators of the Expected Conditional Covariance (ECC) between variables Y and Agiven a high-dimensional covariate X Rp. Under linear regression models for Y and A on X and the proportional asymptotic regime p/n c (0,), we evaluate three existing ECC estimators and two sample splitting strategies for estimating the required nuisance functions. Since no consistent estimator of the nuisance functions exists in the proportional asymptotic regime without imposing further structure on the problem, we first derive debiased versions of the ECC estimators that utilize the ridge regression nuisance function estimators. We show that our bias correction strategy yields n-consistent estimators of the ECC across different sample splitting strategies and estimator choices. We then derive the asymptotic variances of these debiased estimators to illustrate the nuanced interplay between the sample splitting strategy, estimator choice, and tuning parameters of the nuisance function estimators for optimally estimating the ECC. Our analysis reveals that prediction-optimal tuning parameters (i.e., those that optimally estimate the nuisance functions) may not lead to the lowest asymptotic variance of the ECC estimator - thereby demonstrating the need to be careful in selecting tuning parameters based on the final goal of inference. Finally, we verify our theoretical results through extensive numerical experiments.

artificial intelligence, dfmp, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.45)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.54)

Add feedback

ACautionary Tale on Integrating Studies with Disparate Outcome Measures for Causal Inference

Neural Information Processing SystemsJun-15-2026, 19:57:52 GMT

Data integration approaches are increasingly used to enhance the efficiency and generalizability of studies. However, a key limitation of these methods is the assumption that outcome measures are identical across datasets - an assumption that often does not hold in practice. Consider the following opioid use disorder (OUD) studies: the XBOT trial and the POAT study, both evaluating the effect of medications for OUD on withdrawal symptom severity (not the primary outcome of either trial). While XBOT measures withdrawal severity using the subjective opiate withdrawal scale, POAT uses the clinical opiate withdrawal scale. We analyze this realistic yet challenging setting where outcome measures differ across studies and where neither study records both types of outcomes. Our paper studies whether and when integrating studies with disparate outcome measures leads to efficiency gains.

artificial intelligence, assumption, machine learning, (17 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Addiction Disorder (0.69)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.67)

Add feedback

Prediction-Powered Causal Inference by Automatic Debiased Machine Learning and Semi-Supervised Riesz Regression

Kato, Masahiro

arXiv.org Machine LearningJun-12-2026

This study investigates semiparametric efficient estimation of causal and structural parameters in a semi-supervised setting. In our setting, unlabeled auxiliary regressors are available in addition to labeled observations consisting of outcomes and regressors. Our goal is to construct estimators of causal and structural parameters whose asymptotic variances are smaller than those of estimators constructed using only labeled data. We refer to this framework as prediction-powered causal inference (PPCI). We first derive the efficient influence function and the efficiency bound, which imply that the use of auxiliary regressors can attain a smaller asymptotic variance than the efficiency bound attainable from labeled observations alone. Then, by combining the efficient influence function with the debiased machine learning (DML) framework, we propose methods that we call DML-PPCI. If we construct an estimating-equation estimator, we refer to the method as EE-DML-PPCI; if we construct a targeted-learning estimator, we refer to the method as TMLE-DML-PPCI. The asymptotic variances of both estimators match our derived efficiency bound. In the construction of the estimators, estimation of the efficient influence function plays an important role. In our study, the efficient influence function is also a Neyman orthogonal score, which depends on the Riesz representer and the regression function. For Riesz representer estimation, we develop semi-supervised generalized Riesz regression with convergence rate guarantees.

artificial intelligence, estimator, machine learning, (15 more...)

arXiv.org Machine Learning

2606.12892

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.49)

Add feedback

Digital Twins as Synthetic Controls in Single-Arm Trials

Bertolini, Daniele, Fuller, Franklin, Smith, Aaron M., Walsh, Jonathan R., Zhuang, Run

arXiv.org Machine LearningMay-14-2026

Single-arm trials are an important study design for evaluating drug efficacy and safety without enrolling patients into a control arm. Although they do not provide the gold-standard evidence of randomized controlled trials, they are increasingly used in clinical development as they offer an efficient, ethical, and practical alternative. A wide variety of approaches can be used to construct control comparators and estimate treatment effects, from fixed comparators informed by clinical knowledge to data-based and model-based patient-level comparators, also known as synthetic controls. Powerful and flexible machine learning models can allow outcome-model-based synthetic controls to overcome key limitations of direct data-based approaches, yield more robust estimates of treatment effects, and provide a principled way to incorporate corrections or encode additional assumptions when external data are not directly comparable. In this work, we argue that outcome-model-based synthetic control arms are an important tool for single-arm trials. We focus on digital twins, personalized predictions of disease progression generated from machine learning models trained on historical datasets, which naturally leverage these flexible approaches. We review doubly robust estimators, present power and sample size formulas, and discuss trade-offs in selecting historical data for training and analysis. We also outline practical considerations for deploying digital twins within the framework of recent FDA draft guidance on the use of artificial intelligence in drug development. Finally, we reanalyze data from trials in amyotrophic lateral sclerosis and Huntington's disease to demonstrate the proposed methods.

artificial intelligence, estimator, machine learning, (18 more...)

arXiv.org Machine Learning

2605.12832

Country: North America > United States > California > San Francisco County > San Francisco (0.86)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.68)
Government > Regional Government > North America Government > United States Government > FDA (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

The Adaptive Doubly Robust Estimator and a Paradox Concerning Logging Policy

Neural Information Processing SystemsApr-24-2026, 14:56:57 GMT

The doubly robust (DR) estimator, which consists of two nuisance parameters, the conditional mean outcome and the logging policy (the probability of choosing an action), is crucial in causal inference. This paper proposes a DR estimator for dependent samples obtained from adaptive experiments. To obtain an asymptotically normal semiparametric estimator from dependent samples with non-Donsker nuisance estimators, we propose adaptive-fitting as a variant of sample-splitting. We also report an empirical paradox that our proposed DR estimator tends to show better performances compared to other estimators utilizing the true logging policy. While a similar phenomenon is known for estimators with i.i.d.

artificial intelligence, data mining, machine learning, (19 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.93)

Add feedback

Optimal Subsampling with Influence Functions

Daniel Ting, Eric Brochu

Neural Information Processing SystemsFeb-19-2026, 18:42:32 GMT

As the amount of data increases, the question arises as to how best to deal with the large datasets. While computational platforms such as Spark [28] and Ray [23] help process large datasets once a desired model is chosen, simply using smaller data can be a faster solution for exploratory data modeling, rapid prototyping, or other tasks where the accuracy obtainable from the full dataset is notneeded.

artificial intelligence, machine learning, regression, (17 more...)

Neural Information Processing Systems

Country: