AITopics

2605.11638

Country: Asia (0.68)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Guillaume Papa, Aurélien Bellet, Stephan Clémençon

On Graph Reconstruction via Empirical Risk Minimization: Fast Learning Rates and Scalability

Neural Information Processing SystemsMay-1-2026, 06:07:38 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, reconstruction risk, (13 more...)

Country: Europe > France (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

Neural Information Processing SystemsFeb-12-2026, 02:16:14 GMT

48e59000d7dfcf6c1d96ce4a603ed738-AuthorFeedback.pdf

artificial intelligence, kernel, machine learning, (17 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.51)

Whitehouse, Justin, Sawarni, Ayush, Syrgkanis, Vasilis

Statistical Inference and Learning for Shapley Additive Explanations (SHAP)

arXiv.org Machine LearningFeb-12-2026

The SHAP (short for Shapley additive explanation) framework has become an essential tool for attributing importance to variables in predictive tasks. In model-agnostic settings, SHAP uses the concept of Shapley values from cooperative game theory to fairly allocate credit to the features in a vector $X$ based on their contribution to an outcome $Y$. While the explanations offered by SHAP are local by nature, learners often need global measures of feature importance in order to improve model explainability and perform feature selection. The most common approach for converting these local explanations into global ones is to compute either the mean absolute SHAP or mean squared SHAP. However, despite their ubiquity, there do not exist approaches for performing statistical inference on these quantities. In this paper, we take a semi-parametric approach for calibrating confidence in estimates of the $p$th powers of Shapley additive explanations. We show that, by treating the SHAP curve as a nuisance function that must be estimated from data, one can reliably construct asymptotically normal estimates of the $p$th powers of SHAP. When $p \geq 2$, we show a de-biased estimator that combines U-statistics with Neyman orthogonal scores for functionals of nested regressions is asymptotically normal. When $1 \leq p < 2$ (and the hence target parameter is not twice differentiable), we construct de-biased U-statistics for a smoothed alternative. In particular, we show how to carefully tune the temperature parameter of the smoothing function in order to obtain inference for the true, unsmoothed $p$th power. We complement these results by presenting a Neyman orthogonal loss that can be used to learn the SHAP curve via empirical risk minimization and discussing excess risk guarantees for commonly used function classes.

artificial intelligence, inference, machine learning, (15 more...)

2602.10532

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.81)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Neural Information Processing SystemsFeb-9-2026, 22:28:46 GMT

Efficient Aggregated Kernel Tests using IncompleteU-statistics

Theminimax (i.e.optimal) rateoverthe SobolevballSsd(R) is N 2s/(4s+d) forthetwo-sample (Liand Yuan, 2019, Theorem 5 (ii)) andindependence (Albertetal., 2022, Theorem 4; Berrett etal., 2021, Corollary 5) problems.

artificial intelligence, gretton, machine learning, (15 more...)

Country: Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.98)

Neural Information Processing SystemsFeb-9-2026, 22:13:58 GMT

290848c892a85a6936b473d4daa6d0ad-Paper-Conference.pdf

algorithm, inequality, probability, (15 more...)

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > North Carolina (0.04)
(2 more...)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.67)

Neural Information Processing SystemsFeb-9-2026, 19:31:46 GMT

731b952bdc833485eb72f458cdd5c489-Paper-Conference.pdf

kernel, statistic, statistics, (13 more...)

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)
Information Technology > Data Science (0.68)

Wei, Aaron, Jalali, Milad, Sutherland, Danica J.

Maximum Mean Discrepancy with Unequal Sample Sizes via Generalized U-Statistics

arXiv.org Machine LearningDec-17-2025

Existing two-sample testing techniques, particularly those based on choosing a kernel for the Maximum Mean Discrepancy (MMD), often assume equal sample sizes from the two distributions. Applying these methods in practice can require discarding valuable data, unnecessarily reducing test power. W e address this long-standing limitation by extending the theory of generalized U-statistics and applying it to the usual MMD estimator, resulting in new characterization of the asymptotic distributions of the MMD estimator with unequal sample sizes (particularly outside the proportional regimes required by previous partial results). This generalization also provides a new criterion for optimizing the power of an MMD test with unequal sample sizes. Our approach preserves all available data, enhancing test accuracy and applicability in realistic settings. Along the way, we give much cleaner characterizations of the variance of MMD estimators, revealing something that might be surprising to those in the area: while zero MMD implies a degenerate estimator, it is sometimes possible to have a degenerate estimator with nonzero MMD as well; we give a construction and a proof that it does not happen in common situations.

estimator, kernel, variance, (15 more...)

2512.13997

Country:

North America > United States > New Jersey > Hudson County > Hoboken (0.04)
North America > Canada > Quebec (0.04)
North America > Canada > Ontario > Toronto (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)

Chiang, Harold D., Matsushita, Yukitoshi, Otsu, Taisuke

Empirical Likelihood for Random Forests and Ensembles

arXiv.org Machine LearningNov-19-2025

We develop an empirical likelihood (EL) framework for random forests and related ensemble methods, providing a likelihood-based approach to quantify their statistical uncertainty. Exploiting the incomplete $U$-statistic structure inherent in ensemble predictions, we construct an EL statistic that is asymptotically chi-squared when subsampling induced by incompleteness is not overly sparse. Under sparser subsampling regimes, the EL statistic tends to over-cover due to loss of pivotality; we therefore propose a modified EL that restores pivotality through a simple adjustment. Our method retains key properties of EL while remaining computationally efficient. Theory for honest random forests and simulations demonstrate that modified EL achieves accurate coverage and practical reliability relative to existing inference methods.

artificial intelligence, machine learning, random forest, (14 more...)

2511.13934

Country: North America > United States > Wisconsin (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)

Miglioli, Cesare, Awan, Jordan

Incomplete U-Statistics of Equireplicate Designs: Berry-Esseen Bound and Efficient Construction

arXiv.org Machine LearningOct-27-2025

U-statistics are a fundamental class of estimators that generalize the sample mean and underpin much of nonparametric statistics. Although extensively studied in both statistics and probability, key challenges remain: their high computational cost - addressed partly through incomplete U-statistics - and their non-standard asymptotic behavior in the degenerate case, which typically requires resampling methods for hypothesis testing. This paper presents a novel perspective on U-statistics, grounded in hypergraph theory and combinatorial designs. Our approach bypasses the traditional Hoeffding decomposition, the main analytical tool in this literature but one highly sensitive to degeneracy. By characterizing the dependence structure of a U-statistic, we derive a Berry-Esseen bound valid for incomplete U-statistics of deterministic designs, yielding conditions under which Gaussian limiting distributions can be established even in degenerate cases and when the order diverges. We also introduce efficient algorithms to construct incomplete U-statistics of equireplicate designs, a subclass of deterministic designs that, in certain cases, achieve minimum variance. Finally, we apply our framework to kernel-based tests that use Maximum Mean Discrepancy (MMD) and Hilbert-Schmidt Independence Criterion. In a real data example with the CIFAR-10 dataset, our permutation-free MMD test delivers substantial computational gains while retaining power and type I error control.

artificial intelligence, machine learning, u-statistics, (14 more...)

2510.20755

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > Middle East > Jordan (0.04)
North America > United States > North Carolina (0.04)
(4 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)