AITopics | notation

Collaborating Authors

notation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Semiparametrically Efficient Inference for Kernel Measures of Noise Heterogeneity

Wornbard, Jakub, Shen, Zikai, Meunier, Dimitri, Gretton, Arthur

arXiv.org Machine LearningMay-28-2026

We develop semiparametrically efficient inference for kernel measures of noise heterogeneity in additive noise models. In many applications, the regression function is estimated using flexible machine learning methods. Downstream procedures based on the resulting residuals can then inherit first-stage bias: regression error may induce spurious dependence between covariates and residuals, invalidating the assumptions needed for standard analysis. We construct a novel Hilbert-valued one-step estimator of the kernel covariance operator between covariates and residuals. Our estimator yields bootstrap-calibrated tests for residual independence and goodness of fit in additive noise models, while also providing asymptotically efficient confidence intervals for the kernel dependence measure under noise heterogeneity. The framework extends to settings with additional covariates, enabling inference on distributional heterogeneity of residual noise across treatment groups. Simulations show improved calibration and power relative to naive plug-in residual methods.

artificial intelligence, estimator, machine learning, (17 more...)

arXiv.org Machine Learning

2605.27526

Genre: Research Report > Experimental Study (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Add feedback

Average Gradient Outer Product in kernel regression provably recovers the central subspace for multi-index models

Zhu, Libin, Davis, Damek, Drusvyatskiy, Dmitriy, Fazel, Maryam

arXiv.org Machine LearningMay-15-2026

We study a prototypical situation when a learned predictor can discover useful low-dimensional structure in data, while using fewer samples than are needed for accurate prediction. Specifically, we consider the problem of recovering a multi-index polynomial $f^*(x)=h(Ux)$, with $U\in\mathbb{R}^{r\times d}$ and $r\ll d$, from finitely many data/label pairs. Importantly, the target function depends on input $x$ only through the projection onto an unknown $r$-dimensional central subspace. The algorithm we analyze is appealingly simple: fit kernel ridge regression (KRR) to the data and compute the Average Gradient Outer Product (AGOP) from the fitted predictor. Our main results show that under reasonable assumptions the top $r$-dimensional eigenspace of AGOP provably recovers the central subspace, even in regimes when the prediction error remains large. Specifically, if the target function $f^*$ has degree $p^*$, it is known that $n\asymp d^{p^*}$ samples are necessary for KRR to achieve accurate prediction. In contrast, we show that if a low degree $p$ component of $f^*$ already carries all relevant directions for prediction, subspace recovery occurs in the much lower sample regime $n\asymp d^{p+δ}$ for any $δ\in(0,1)$. Our results thus demonstrate a separation between prediction and representation, and provide an explanation for why iterative kernel methods such as Recursive Feature Machines (RFM) can be sample-efficient in practice.

artificial intelligence, coefficient, machine learning, (17 more...)

arXiv.org Machine Learning

2605.15082

Country: North America > United States > Pennsylvania (0.27)

Genre: Research Report > New Finding (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Add feedback

The Bernstein-von Mises theorem for Bayesian one-pass online learning

Lee, Jeyong, Choi, Junhyeok, Kim, Dongguen, Chae, Minwoo

arXiv.org Machine LearningMay-1-2026

Bayesian online learning provides a coherent framework for sequential inference. However, its theoretical understanding remains limited, particularly in the one-pass setting. Existing theoretical guarantees typically require the mini-batch sample size to diverge, a condition that fails in the one-pass regime. In this paper, we propose a new Bayesian online learning algorithm tailored to the one-pass setting, which incorporates a warm-start phase to ensure stable sequential updates. For this algorithm, we show that the sequentially updated posterior attains the optimal convergence rate. Building on this, we establish an online analogue of the Bernstein-von Mises theorem, which guarantees valid uncertainty quantification without diverging mini-batch sample sizes. Our analysis is based on a novel theoretical framework that differs fundamentally from existing approaches in the online learning literature. Numerical experiments on generalized linear models show that the proposed method matches the performance of the batch estimator while outperforming existing online procedures.

artificial intelligence, inequality hold, machine learning, (18 more...)

arXiv.org Machine Learning

2604.27442

Genre: Research Report (0.83)

Industry: Education > Educational Setting > Online (1.00)

Technology:

Information Technology > Enterprise Applications > Human Resources > Learning Management (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.92)

Add feedback

58b7483ba899e0ce4d97ac5eecf6fa99-Supplemental.pdf

Neural Information Processing SystemsApr-26-2026, 01:09:41 GMT

artificial intelligence, machine learning, sequence, (19 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.45)

Add feedback

Appendix Potential Negative Societal Impacts

Neural Information Processing SystemsApr-25-2026, 19:26:26 GMT

C.3 Other Differences Besides the above discussion, there are some other differences between Daniely [12] and our work. First, they analyze SGD, and we analyze a constrained optimization problem and projected SGD. This may be the reason why we can get a stronger bound on width. In the experiments in Section 5, we observe that SGD performs badly when the width is small (see the first left column in (b), Figure 4). Therefore, we suspect an algorithmic change is needed to train narrow nets with such width (due to the training difficulty), and we indeed propose a new method to train narrow nets. Second, they consider binary {+1, 1}dataset, while our results apply to arbitrary labels. In addition, their proof seems to be highly dependent on the fact that the labels are {+1, 1}, and seems hard to generalize to general labels.

artificial intelligence, machine learning, training regime, (18 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.48)

Industry: Social Sector (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Add feedback

47a658229eb2368a99f1d032c8848542-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 17:13:11 GMT

Based on the feedback from the reviewers, we perform the following additional experiments which 0 explore the robustness of the choice of buffer size in SGD RER, choice of step sizes for GLMtron 10 and the behavior of the said algorithms with heavy tailed noise with a similar setup as in Section 7. We first perform an experimental study about the robustness of SGD RER to the choice of buffer size in Figure 3a. Notice that the performance remains the same for a large range of buffer sizes ( 100 from to 2000). However the performance degrades when the buffer size is too large ( 10000). We believe this is the case since the number of buffers decreases as the buffer size increases and the output is averaged over too few number of iterates (In the case of B = 10000, the final output is just an average of 10 iterates). Theoretically, this largest step-size is L where Lis the largest eigenvalue of -1 the Hessian. In the case of GLMtron, it was experimentally observed that if the step size was chosen 10 to be about 1.5 times the step size reported in Section 7, the iterates diverged. Quasi Newton method essentially normalizes the gradient with the inverse of the Hessian (or rather an approximation of the Hessian) in order to let it converge faster with large step sizes. In Figure 4, we consider the same system as in Section 7 but with heavy tailed noise given by the student t distribution (scale ν = 4.1) so that the 4-th moment exists but higher moments do not. The typical behavior of Forward SGD, SGD-ER, SGD-RER and Quasi Newton methods seems to be similar to that observed in the Sub-Gaussian noise case. However, GLMtron requires much smaller step sizes to ensure convergence and hence it takes much longer.

artificial intelligence, equation, iterate, (17 more...)

Neural Information Processing Systems

Genre: Research Report (0.54)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.54)

Add feedback

1c10d0c087c14689628124bbc8fa69f6-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 13:46:24 GMT

A.1 For LEHD model467 In Table 5, we explore the effects of eliminating normalization from the attention layer in our LEHD468 model. We train three LEHD models with the same training scheme and training budget, differing469 solely in the attention layer: one with batch normalization (BN), one with instance normalization470 (IN), and one without normalization (w/o). We train all three POMO models with the same reinforcement learning method477 with POMO strategy and training budget (1000 epochs). The results show that different types of478 normalization have few effects on the POMO model.479 The results in Table 6 show that removing normalization from attention layer has little impact on the480 model with a heavy encoder and a light decoder.

artificial intelligence, machine learning, node, (19 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.35)

Add feedback