AITopics | update rule

SubTrack Gradient Subspace Tracking for Scalable

Neural Information Processing SystemsJun-15-2026, 23:15:50 GMT

Training large language models (LLMs) is highly resource-intensive due to their massive number of parameters and the overhead of optimizer states. While recent work has aimed to reduce memory consumption, such efforts often entail trade-offs among memory efficiency, training time, and model performance. Yet, true democratization of LLMs requires simultaneous progress across all three dimensions. To this end, we propose SubTrack++ that leverages Grassmannian gradient subspace tracking combined with projection-aware optimizers, enabling Adam's internal statistics to adapt to subspace changes. Additionally, employing recovery scaling, a technique that restores information lost through low-rank projections, further enhances model performance. Our method demonstrates SOTA convergence by exploiting Grassmannian geometry, reducing training wall-time by up to 65% compared to the best performing baseline, LDAdam, while preserving the reduced memory footprint.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country: North America > Mexico (0.28)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Last-Iterate Convergence of Smooth Regret Matching + Variants in Learning Nash Equilibria

Neural Information Processing SystemsJun-15-2026, 09:33:10 GMT

Regret Matching+ (RM+) variants are widely used to build superhuman Poker AIs, yet few studies investigate their last-iterate convergence in learning a Nash equilibrium (NE). Although their last-iterate convergence is established for games satisfying the Minty Variational Inequality (MVI), no studies have demonstrated that these algorithms achieve such convergence in the broader class of games satisfying the weak MVI. A key challenge in proving last-iterate convergence for RM+ variants in games satisfying the weak MVI is that even if the game's loss gradient satisfies the weak MVI, RM+ variants operate on a transformed loss feedback which does not satisfy the weak MVI. To provide last-iterate convergence for RM+ variants, we introduce a concise yet novel proof paradigm that involves: (i) transforming an RM+ variant into an Online Mirror Descent (OMD) instance that updates within the original strategy space of the game to recover the weak MVI, and (ii) showing last-iterate convergence by proving the distance between accumulated regrets converges to zero via the recovered weak MVI of the feedback. Inspired by our proof paradigm, we propose Smooth Optimistic Gradient Based RM+ (SOGRM+) and show that it achieves last-iterate and finite-time best-iterate convergence in learning an NE of games satisfying the weak MVI, the weakest condition among all known RM+ variants. Experiments show that SOGRM+ significantly outperforms other algorithms. Our code is available at https://github.

artificial intelligence, convergence, machine learning, (14 more...)

Neural Information Processing Systems

Country: Asia > China (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Leisure & Entertainment > Games (0.68)
Information Technology > Security & Privacy (0.45)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)

Add feedback

Joint Model and Data Sparsification via the Marginal Likelihood

Timans, Alexander, Möllenhoff, Thomas, Naesseth, Christian A., Khan, Mohammad Emtiyaz, Nalisnick, Eric

arXiv.org Machine LearningMay-29-2026

Sparse recovery in linear systems underpins applications from signal processing to high-dimensional regression. Sparse Bayesian Learning, grounded in the principle of automatic relevance determination (ARD), offers a practical Bayesian mechanism for feature sparsity via marginal likelihood optimization. Yet, its reliance on a homoscedastic noise model renders it sensitive to data contaminations such as outliers or misspecified noise, harming model fit and predictions. Instead, we propose jointly learning individual feature and sample relevancies, enabling simultaneous model and data sparsification via a single Bayesian objective. This symmetric pruning of model and data offers a natural extension that preserves conjugacy, admits closed-form updates for standard optimization procedures, and aligns with perspectives from robust regression and influence functions. Empirical results across diverse regression tasks affirm that a joint ARD approach consistently yields both sparse and robust prediction models.

artificial intelligence, joint model and data sparsification, machine learning, (12 more...)

arXiv.org Machine Learning

2605.29908

Country: Asia > Japan (0.28)

Genre: Research Report (0.64)

Add feedback

Online Conformal Prediction: Enforcing monotonicity via Online Optimization

Rivera, Eduardo Ochoa, Tewari, Ambuj

arXiv.org Machine LearningMay-14-2026

Conformal prediction provides a principled framework for uncertainty quantification with finite-sample coverage guarantees. While recent work has extended conformal prediction to online and sequential settings, existing methods typically focus on a single coverage level and do not ensure consistency across multiple confidence levels. In many real-world applications, such as weather forecasting, macroeconomic prediction, and risk management, different users operate under heterogeneous risk tolerances and require calibrated uncertainty estimates across a range of coverage levels. In such settings, it is desirable to produce prediction sets corresponding to different coverage levels that are nested and valid simultaneously. In this paper, we propose two novel online conformal prediction methods that output \emph{nested prediction sets} across a range of coverage levels, enabling simultaneous uncertainty quantification across the entire risk spectrum. Beyond interpretability, jointly estimating multiple coverage levels is known to improve statistical efficiency in classical quantile regression by enforcing non-crossing constraints and sharing information across quantiles. Our approaches leverage an online optimization perspective with small regret that translates to quantile estimation error control while enforcing nestedness of prediction sets. Empirical results on synthetic and real-world datasets, including applications in forecasting tasks with heterogeneous risk requirements, demonstrate that our method achieves stable coverage across all levels, strictly nested prediction sets, and improved efficiency compared to existing online conformal baselines.

data mining, machine learning, prediction, (20 more...)

arXiv.org Machine Learning

2605.12668

Country: North America > United States (1.00)

Genre: Research Report (0.40)

Industry:

Government > Regional Government > North America Government > United States Government (0.68)
Banking & Finance > Economy (0.68)

Technology:

Information Technology > Data Science > Data Mining (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.34)

Add feedback

e4d3fe32495088805bbbb4f1de63e947-Paper-Conference.pdf

Neural Information Processing SystemsApr-30-2026, 02:42:11 GMT

artificial intelligence, inequality, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States > California > Los Angeles County > Los Angeles (0.27)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

412604be30f701b1b1e3124c252065e6-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 14:38:26 GMT

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Communications > Networks (0.67)

Add feedback

310b60949d2b6096903d7e8a539b20f5-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 09:01:14 GMT

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country: Europe (0.67)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

2e622ac74f66df03b686a12e2e0e4424-Paper-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 07:43:15 GMT

algorithm, artificial intelligence, machine learning, (13 more...)

Neural Information Processing Systems

Country: Asia (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)

Add feedback

Checklist

Neural Information Processing SystemsApr-25-2026, 04:39:15 GMT

For all authors... (a) Do the main claims made in the abstract and introduction accurately reflect the paper's contributions and scope? If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [No] (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)?

artificial intelligence, machine learning, smax, (18 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Flexible Option Learning

Neural Information Processing SystemsApr-25-2026, 03:48:04 GMT

Temporal abstraction in reinforcement learning (RL), offers the promise of improving generalization and knowledge transfer in complex environments, by propagating information more efficiently over time. Although option learning was initially formulated in a way that allows updating many options simultaneously, using off-policy, intra-option learning (Sutton, Precup & Singh, 1999), many of the recent hierarchical reinforcement learning approaches only update a single option at a time: the option currently executing. We revisit and extend intra-option learning in the context of deep reinforcement learning, in order to enable updating all options consistent with current primitive action choices, without introducing any additional estimates. Our method can therefore be naturally adopted in most hierarchical RL frameworks. When we combine our approach with the option-critic algorithm for option discovery, we obtain significant improvements in performance and data-efficiency across a wide variety of domains.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country: North America > United States > New York (0.28)

Genre: Research Report (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback