AITopics | deduce

f0156a82b6af6a4e838923ce9c124424-Paper-Conference.pdf

Neural Information Processing SystemsJun-23-2026, 02:12:07 GMT

Structure-agnostic causal inference studies how well one can estimate a treatment effect given black-box machine learning estimates of nuisance functions (like the impact of confounders on treatment and outcomes). Here, we find that the answer depends in a surprising way on the distribution of the treatment noise. Focusing on the partially linear model of Robinson [1988], we first show that the widely adopted double machine learning (DML) estimator is minimax rate-optimal for Gaussian treatment noise, resolving an open problem of Mackey et al. [2018]. Meanwhile, for independent non-Gaussian treatment noise, we show that DML is always suboptimal by constructing new practical procedures with higher-order robustness to nuisance errors. These ACE procedures use structure-agnostic cumulant estimators to achieve r-th order insensitivity to nuisance errors whenever the (r + 1)-st treatment cumulant is non-zero. We complement these core results with novel minimax guarantees for binary treatments in the partially linear model. Finally, using synthetic demand estimation experiments, we demonstrate the practical benefits of our higher-order robust estimators.

artificial intelligence, assumption, machine learning, (17 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Statistical Inference for Misspecified Contextual Bandits

Guo, Yongyi, Xu, Ziping

arXiv.org Machine LearningJun-23-2026

Contextual bandit algorithms have transformed modern experimentation by enabling real-time adaptation for personalized treatment. Yet these advantages create challenges for statistical inference due to adaptivity. We study inference with contextual-bandit data without assuming a well-specified outcome model. In this setting, we show a previously overlooked issue: standard algorithms such as LinUCB may fail to stabilize under misspecified working models, leading to non-Gaussian estimator behavior and invalid inference. This issue is practically important, as misspecified working models -- such as approximations of complex dynamical systems -- are often employed by online agents in real-world adaptive experiments to balance reward, computational tractability, and robustness. We develop an inverse-probability-weighted Z-estimation framework for a broad class of marginal moment targets, including projection parameters, structural parameters with noisy contexts, and off-policy values. We identify a stability condition tailored to this framework, scaled inverse-propensity convergence, under which the IPW-Z estimator is consistent and asymptotically normal with a consistent sandwich variance estimator. We further establish sufficient conditions for scaled inverse-propensity convergence for several policy classes, including multi-armed bandit algorithms and smooth contextual allocation policies. Simulations and a HeartSteps V1 real-data-calibrated application show reliable coverage and competitive performance across multiple targets. Overall, our results highlight the importance of stability-aware adaptive design for valid post-experiment inference.

artificial intelligence, data mining, machine learning, (21 more...)

arXiv.org Machine Learning

2606.22639

Country: North America > United States (1.00)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.68)
Education > Educational Setting (0.67)
Health & Medicine > Consumer Health (0.45)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Self-Distillation is Optimal Among Spectral Shrinkage Estimators in Spiked Covariance Models

Lecoiu, Radu, Mukherjee, Debarghya, Sur, Pragya

arXiv.org Machine LearningMay-19-2026

Self-distillation has emerged as a promising technique for improving model performance in modern machine learning systems. We develop the statistical foundations of self-distillation in spiked covariance models, by introducing and analyzing a broad class of estimators, namely spectral shrinkage estimators. We establish that for spiked covariance matrices with $s$ spikes, $s$-step self-distillation achieves optimal performance among spectral shrinkage estimators, outperforming well-known estimators in statistics and machine learning. Moreover, we show that $s$ steps are necessary for optimality: any $(s-k)$-step distilled estimator is strictly suboptimal for $1 \leq k \leq s$. For the special subclass of isotropic covariances, we show that optimally tuned Ridge regression performs best among spectral shrinkage estimators. We also study a federated approach where multiple data centers share spectral shrinkage estimators and a common server seeks to aggregate them to achieve optimal performance. In this case, we find that the best local rule again takes the form of self-distillation, though it differs from the optimal rule when data are hosted centrally on a single server. Together, our results elucidate why self-distillation improves predictive performance and provide a broader statistical framework connecting it with classical shrinkage-based methods.

artificial intelligence, machine learning, theorem 3, (19 more...)

arXiv.org Machine Learning

2605.17778

Country: North America > United States > New York (0.27)

Genre:

Research Report > New Finding (0.34)
Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

228ffa71ce31ebbdebc6cf413a39cdce-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 01:54:08 GMT

data mining, machine learning, reinforcement learning, (20 more...)

Neural Information Processing Systems

Country: North America > United States (0.67)

Genre: Research Report (0.46)

Industry:

Health & Medicine (0.49)
Law > Civil Rights & Constitutional Law (0.33)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.46)

Add feedback

Low-degree Lower bounds for clustering in moderate dimension

Carpentier, Alexandra, Verzelen, Nicolas

arXiv.org Machine LearningFeb-27-2026

We study the fundamental problem of clustering $n$ points into $K$ groups drawn from a mixture of isotropic Gaussians in $\mathbb{R}^d$. Specifically, we investigate the requisite minimal distance $Δ$ between mean vectors to partially recover the underlying partition. While the minimax-optimal threshold for $Δ$ is well-established, a significant gap exists between this information-theoretic limit and the performance of known polynomial-time procedures. Although this gap was recently characterized in the high-dimensional regime ($n \leq dK$), it remains largely unexplored in the moderate-dimensional regime ($n \geq dK$). In this manuscript, we address this regime by establishing a new low-degree polynomial lower bound for the moderate-dimensional case when $d \geq K$. We show that while the difficulty of clustering for $n \leq dK$ is primarily driven by dimension reduction and spectral methods, the moderate-dimensional regime involves more delicate phenomena leading to a "non-parametric rate". We provide a novel non-spectral algorithm matching this rate, shedding new light on the computational limits of the clustering problem in moderate dimension.

artificial intelligence, machine learning, node, (17 more...)

arXiv.org Machine Learning

2602.23023

Country:

North America > United States (0.14)
Europe > France > Occitanie > Hérault > Montpellier (0.04)
Europe > Italy > Lazio > Rome (0.04)
(3 more...)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.65)

Add feedback

b2b3e1d9840eba17ad9bbf073e009afe-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-16-2026, 15:29:27 GMT

artificial intelligence, converge, machine learning, (18 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

730ce0ae730f39e4d77b0f04a8afe4be-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-14-2026, 09:43:00 GMT

This paper studies the use of a machine learning-based estimator as a control variate for mitigating the variance of Monte Carlo sampling. Specifically, we seek to uncover the key factors that influence the efficiency of control variates in reducing variance.

artificial intelligence, control variate, machine learning, (17 more...)

Neural Information Processing Systems

Country: