AITopics

2605.07964

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.35)

Neural Information Processing SystemsMar-22-2026, 19:07:59 GMT

Regularized Adaptive Momentum Dual Averaging with an Efficient Inexact Subproblem Solver for Training Structured Neural Network

We propose a Regularized Adaptive Momentum Dual Averaging (RAMDA) algorithm for training structured neural networks. Similar to existing regularized adaptive methods, the subproblem for computing the update direction of RAMDA involves a nonsmooth regularizer and a diagonal preconditioner, and therefore does not possess a closed-form solution in general. We thus also carefully devise an implementable inexactness condition that retains convergence guarantees similar to the exact versions, and propose a companion efficient solver for the subproblems of both RAMDA and existing methods to make them practically feasible. We leverage the theory of manifold identification in variational analysis to show that, even in the presence of such inexactness, the iterates of RAMDA attain the ideal structure induced by the regularizer at the stationary point of asymptotic convergence. This structure is locally optimal near the point of convergence, so RAMDA is guaranteed to obtain the best structure possible among all methods converging to the same point, making it the first regularized adaptive method outputting models that possess outstanding predictive performance while being (locally) optimally structured. Extensive numerical experiments in large-scale modern computer vision, language modeling, and speech tasks show that the proposed RAMDA is efficient and consistently outperforms state of the art for training structured neural network.

artificial intelligence, machine learning, proceedings, (9 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.53)

Neural Information Processing SystemsFeb-18-2026, 13:04:13 GMT

e83b86156555ab9692743f9f8f67adf1-Paper-Conference.pdf

artificial intelligence, machine learning, natural language, (18 more...)

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
Asia > Taiwan > Taiwan Province > Taipei (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

arXiv.org Machine LearningFeb-17-2026

Testing For Distribution Shifts with Conditional Conformal Test Martingales

Shaer, Shalev, Bar, Yarin, Prinster, Drew, Romano, Yaniv

We propose a sequential test for detecting arbitrary distribution shifts that allows conformal test martingales (CTMs) to work under a fixed, reference-conditional setting. Existing CTM detectors construct test martingales by continually growing a reference set with each incoming sample, using it to assess how atypical the new sample is relative to past observations. While this design yields anytime-valid type-I error control, it suffers from test-time contamination: after a change, post-shift observations enter the reference set and dilute the evidence for distribution shift, increasing detection delay and reducing power. In contrast, our method avoids contamination by design by comparing each new sample to a fixed null reference dataset. Our main technical contribution is a robust martingale construction that remains valid conditional on the null reference data, achieved by explicitly accounting for the estimation error in the reference distribution induced by the finite reference set. This yields anytime-valid type-I error control together with guarantees of asymptotic power one and bounded expected detection delay. Empirically, our method detects shifts faster than standard CTMs, providing a powerful and reliable distribution-shift detector.

artificial intelligence, distribution shift, machine learning, (18 more...)

2602.13848

Country:

Asia > Middle East > Israel (0.04)
North America > United States > Maryland > Baltimore (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

arXiv.org Machine LearningDec-24-2025

Optimal Anytime-Valid Tests for Composite Nulls

Shekhar, Shubhanshu

We consider the problem of designing optimal level-$α$ power-one tests for composite nulls. Given a parameter $α\in (0,1)$ and a stream of $\mathcal{X}$-valued observations $\{X_n: n \geq 1\} \overset{i.i.d.}{\sim} P$, the goal is to design a level-$α$ power-one test $τ_α$ for the null $H_0: P \in \mathcal{P}_0 \subset \mathcal{P}(\mathcal{X})$. Prior works have shown that any such $τ_α$ must satisfy $\mathbb{E}_P[τ_α] \geq \tfrac{\log(1/α)}{γ^*(P, \mathcal{P}_0)}$, where $γ^*(P, \mathcal{P}_0)$ is the so-called $\mathrm{KL}_{\inf}$ or minimum divergence of $P$ to the null class. In this paper, our objective is to develop and analyze constructive schemes that match this lower bound as $α\downarrow 0$. We first consider the finite-alphabet case~($|\mathcal{X}| = m < \infty$), and show that a test based on \emph{universal} $e$-process~(formed by the ratio of a universal predictor and the running null MLE) is optimal in the above sense. The proof relies on a Donsker-Varadhan~(DV) based saddle-point representation of $\mathrm{KL}_{\inf}$, and an application of Sion's minimax theorem. This characterization motivates a general method for arbitrary $\mathcal{X}$: construct an $e$-process based on the empirical solutions to the saddle-point representation over a sufficiently rich class of test functions. We give sufficient conditions for the optimality of this test for compact convex nulls, and verify them for Hölder smooth density models. We end the paper with a discussion on the computational aspects of implementing our proposed tests in some practical settings.

assumption 1, denote, relative entropy, (13 more...)

2512.20039

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
North America > United States > California (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Martinez-Taboada, Diego, Gonzalez, Tomas, Ramdas, Aaditya

Vector-valued self-normalized concentration inequalities beyond sub-Gaussianity

arXiv.org Machine LearningNov-6-2025

The study of self-normalized processes plays a crucial role in a wide range of applications, from sequential decision-making to econometrics. While the behavior of self-normalized concentration has been widely investigated for scalar-valued processes, vector-valued processes remain comparatively underexplored, especially outside of the sub-Gaussian framework. In this contribution, we provide concentration bounds for self-normalized processes with light tails beyond sub-Gaussianity (such as Bennett or Bernstein bounds). We illustrate the relevance of our results in the context of online linear regression, with applications in (kernelized) linear bandits.

artificial intelligence, inequality, machine learning, (16 more...)

2511.03606

Country: North America (0.46)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.35)

Neural Information Processing SystemsOct-10-2025, 20:07:43 GMT

Regularized Adaptive Momentum Dual Averaging with an Efficient Inexact Subproblem Solver for Training Structured Neural Network

This work aims to fill this gap to propose a practical regularized adaptive method with guarantees for both convergence and structure identification.

algorithm, experiment, preconditioner, (14 more...)

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
Asia > Taiwan > Taiwan Province > Taipei (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsMay-27-2025, 20:12:25 GMT

Regularized Adaptive Momentum Dual Averaging with an Efficient Inexact Subproblem Solver for Training Structured Neural Network

efficient inexact subproblem solver, regularized adaptive momentum dual averaging, training structured neural network, (3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Kilian, Valentin, Cortinovis, Stefano, Caron, François

Anytime-valid, Bayes-assisted,Prediction-Powered Inference

arXiv.org Machine LearningMay-26-2025

Given a large pool of unlabelled data and a smaller amount of labels, prediction-powered inference (PPI) leverages machine learning predictions to increase the statistical efficiency of standard confidence interval procedures based solely on labelled data, while preserving their fixed-time validity. In this paper, we extend the PPI framework to the sequential setting, where labelled and unlabelled datasets grow over time. Exploiting Ville's inequality and the method of mixtures, we propose prediction-powered confidence sequence procedures that are valid uniformly over time and naturally accommodate prior knowledge on the quality of the predictions to further boost efficiency. We carefully illustrate the design choices behind our method and demonstrate its effectiveness in real and synthetic examples.

artificial intelligence, machine learning, sequence, (16 more...)

2505.18

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Csillag, Daniel, Struchiner, Claudio José, Goedert, Guilherme Tegoni

Prediction-Powered E-Values

arXiv.org Machine LearningFeb-6-2025

Quality statistical inference requires a sufficient amount of data, which can be missing or hard to obtain. To this end, prediction-powered inference has risen as a promising methodology, but existing approaches are largely limited to Z-estimation problems such as inference of means and quantiles. In this paper, we apply ideas of prediction-powered inference to e-values. By doing so, we inherit all the usual benefits of e-values -- such as anytime-validity, post-hoc validity and versatile sequential inference -- as well as greatly expand the set of inferences achievable in a prediction-powered manner. In particular, we show that every inference procedure that can be framed in terms of e-values has a prediction-powered counterpart, given by our method. We showcase the effectiveness of our framework across a wide range of inference tasks, from simple hypothesis testing and confidence intervals to more involved procedures for change-point detection and causal discovery, which were out of reach of previous techniques. Our approach is modular and easily integrable into existing algorithms, making it a compelling choice for practical applications.

artificial intelligence, machine learning, ppi, (19 more...)

2502.04294

Country:

North America > United States (0.14)
Asia > Middle East > Jordan (0.04)
South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)