AITopics

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Neural Information Processing SystemsJun-15-2026, 21:09:05 GMT

Certifying Concavity and Monotonicity in Games via Sum-of-Squares Hierarchies

Concavity and its refinements underpin tractability in multiplayer games, where players independently choose actions to maximize their own payoffs which depend on other players' actions. In concave games, where players' strategy sets are compact and convex, and their payoffs are concave in their own actions, strong guarantees follow: Nash equilibria always exist and decentralized algorithms converge to equilibria. If the game is furthermore monotone, an even stronger guarantee holds: Nash equilibria are unique under strictness assumptions. Unfortunately, we show that certifying concavity or monotonicity is NP-hard, already for games where utilities are multivariate polynomials and compact, convex basic semialgebraic strategy sets--an expressive class that captures extensive-form games with imperfect recall. On the positive side, we develop two hierarchies of sum-of-squares programs that certify concavity and monotonicity of a given game, and each level of the hierarchies can be solved in polynomial time. We show that almost all concave/monotone games are certified at some finite level of the hierarchies. Subsequently, we introduce the classes of SOS-concave/monotone games, which globally approximate concave/monotone games, and show that for any given game we can compute the closest SOS-concave/monotone game in polynomial time. Finally, we apply our techniques to canonical examples of extensiveform games with imperfect recall.

artificial intelligence, machine learning, maxsj, (17 more...)

Industry: Leisure & Entertainment > Games (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.66)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.46)

Neural Information Processing SystemsJun-15-2026, 09:12:42 GMT

Predictability Enables of Nonlinear State Space Models

The rise of parallel computing hardware has made it increasingly important to understand which nonlinear state space models can be efficiently parallelized.

artificial intelligence, information processing system, machine learning, (16 more...)

Country:

North America > United States (0.28)
North America > Canada (0.28)
Europe > United Kingdom > England (0.28)
Europe > Germany (0.28)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

De Castro, Yohann, Gadat, Sébastien, Marteau, Clément

Fast Spawn\&Prune (FS\&P): Global convergence of stochastic conic particle gradient descent via birth/death process

arXiv.org Machine LearningMay-20-2026

We investigate the global optimization of the objective function arising in continuous sparse regression, specifically the Beurling LASSO (BLASSO), over the space of measures. While Conic Particle Gradient Descent (CPGD) methods are computationally efficient, they may become trapped in local minima due to the non-convexity of the parameterization. To overcome this limitation, we introduce Fast Spawn\&Prune (FS\&P), a stochastic algorithm that extends FastPart introduced in De Castro et al. (2025) and combines CPGD with a birth-death process. The birth mechanism ensures asymptotic global exploration by introducing particles in regions where first-order optimality conditions are violated, while the death process preserves computational efficiency by pruning non-informative particles. We provide the first theoretical guarantee of global convergence for this class of discrete-time stochastic algorithms, without requiring exponentially large initializations. Furthermore, we derive explicit convergence rates for the excess risk, which scale as $\mathcal{O}\big(\left(\log K / K\right)^{\frac{1}{2(2+d)}}\big)$, where $K$ denotes the number of iterations and d the dimension of the domain, thereby quantifying the trade-off between global exploration and local refinement. Moreover, the sample complexity is $\mathcal{O}\big(N^{-\frac{1}{4(2+d)}}\big)$ (up to logarithmic factors). We also propose a horizon-free variant that does not require prior knowledge of the iteration budget.

artificial intelligence, assumption, machine learning, (18 more...)

2605.19784

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.34)

Grill, Jean-Bastien, Valko, Michal, Munos, Rémi

Black-box optimization of noisy functions with unknown smoothness

arXiv.org Machine LearningMay-5-2026

We study the problem of black-box optimization of a function f of any dimension, given function evaluations perturbed by noise. The function is assumed to be locally smooth around one of its global optima, but this smoothness is unknown. Our contribution is an adaptive optimization algorithm, POO or parallel optimistic optimization, that is able to deal with this setting. POO performs almost as well as the best known algorithms requiring the knowledge of the smoothness. Furthermore, POO works for a larger class of functions than what was previously considered, especially for functions that are difficult to optimize, in a very precise sense. We provide a finite-time analysis of POO's performance, which shows that its error after n evaluations is at most a factor of sqrt(ln n) away from the error of the best known optimization algorithms using the knowledge of the smoothness.

algorithm, artificial intelligence, machine learning, (17 more...)

2605.02462

Country: Europe (0.14)

Genre: Research Report (0.40)

Industry: Transportation > Air (0.60)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.86)

Neural Information Processing SystemsApr-24-2026, 11:49:05 GMT

Revisiting Hilbert Schmidt Information Bottleneck for Adversarial Robustness

We investigate the HSIC (Hilbert-Schmidt independence criterion) bottleneck as a regularizer for learning an adversarially robust deep neural network classifier. In addition to the usual cross-entropy loss, we add regularization terms for every intermediate layer to ensure that the latent representations retain useful information for output prediction while reducing redundant information. We show that the HSIC bottleneck enhances robustness to adversarial attacks both theoretically and experimentally. In particular, we prove that the HSIC bottleneck regularizer reduces the sensitivity of the classifier to adversarial examples. Our experiments on multiple benchmark datasets and architectures demonstrate that incorporating an HSIC bottleneck regularizer attains competitive natural accuracy and improves adversarial robustness, both with and without adversarial examples during training. Our code and adversarially robust models are publicly available.2

artificial intelligence, hsic, machine learning, (14 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.55)

Tuci, Mario, Korkmaz, Caner, Şimşekli, Umut, Birdal, Tolga

Generalization at the Edge of Stability

arXiv.org Machine LearningApr-22-2026

Training modern neural networks often relies on large learning rates, operating at the edge of stability, where the optimization dynamics exhibit oscillatory and chaotic behavior. Empirically, this regime often yields improved generalization performance, yet the underlying mechanism remains poorly understood. In this work, we represent stochastic optimizers as random dynamical systems, which often converge to a fractal attractor set (rather than a point) with a smaller intrinsic dimension. Building on this connection and inspired by Lyapunov dimension theory, we introduce a novel notion of dimension, coined the `sharpness dimension', and prove a generalization bound based on this dimension. Our results show that generalization in the chaotic regime depends on the complete Hessian spectrum and the structure of its partial determinants, highlighting a complexity that cannot be captured by the trace or spectral norm considered in prior work. Experiments across various MLPs and transformers validate our theory while also providing new insights into the recently observed phenomenon of grokking.

artificial intelligence, arxivpreprintarxiv, machine learning, (18 more...)

2604.1974

Country:

Europe > France (0.14)
North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > Italy (0.04)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)

Yiu, Horace, Sánchez-Betancourt, Leandro, Cartea, Álvaro, Duran-Martin, Gerardo

Doubly Outlier-Robust Online Infinite Hidden Markov Model

arXiv.org Machine LearningApr-17-2026

We derive a robust update rule for the online infinite hidden Markov model (iHMM) for when the streaming data contains outliers and the model is misspecified. Leveraging recent advances in generalised Bayesian inference, we define robustness via the posterior influence function (PIF), and provide conditions under which the online iHMM has bounded PIF. Imposing robustness inevitably induces an adaptation lag for regime switching. Our method, which is called Batched Robust iHMM (BR-iHMM), balances adaptivity and robustness with two additional tunable parameters. Across limit order book data, hourly electricity demand, and a synthetic high-dimensional linear system, BR-iHMM reduces one-step-ahead forecasting error by up to 67% relative to competing online Bayesian methods. Together with theoretical guarantees of bounded PIF, our results highlight the practicality of our approach for both forecasting and interpretable online learning.

artificial intelligence, machine learning, st 1, (19 more...)

2604.14322

Country:

Asia > Middle East > Jordan (0.04)
Europe > United Kingdom (0.04)

Genre:

Research Report (1.00)
Instructional Material > Course Syllabus & Notes (0.67)

Industry:

Energy > Power Industry (0.34)
Education > Educational Setting > Online (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Dereziński, Michał, Dong, Xiaoyu

Last-Iterate Convergence of Randomized Kaczmarz and SGD with Greedy Step Size

arXiv.org Machine LearningApr-14-2026

We study last-iterate convergence of SGD with greedy step size over smooth quadratics in the interpolation regime, a setting which captures the classical Randomized Kaczmarz algorithm as well as other popular iterative linear system solvers. For these methods, we show that the $t$-th iterate attains an $O(1/t^{3/4})$ convergence rate, addressing a question posed by Attia, Schliserman, Sherman, and Koren, who gave an $O(1/t^{1/2})$ guarantee for this setting. In the proof, we introduce the family of stochastic contraction processes, whose behavior can be described by the evolution of a certain deterministic eigenvalue equation, which we analyze via a careful discrete-to-continuous reduction.

artificial intelligence, kaczmarz, machine learning, (17 more...)

2604.09909

Country: North America > United States (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

arXiv.org Machine LearningApr-10-2026

A Direct Approach for Handling Contextual Bandits with Latent State Dynamics

Li, Zhen, Stoltz, Gilles

We revisit the finite-armed linear bandit model by Nelson et al. (2022), where contexts and rewards are governed by a finite hidden Markov chain. Nelson et al. (2022) approach this model by a reduction to linear contextual bandits; but to do so, they actually introduce a simplification in which rewards are linear functions of the posterior probabilities over the hidden states given the observed contexts, rather than functions of the hidden states themselves. Their analysis (but not their algorithm) also does not take into account the estimation of the HMM parameters, and only tackles expected, not high-probability, bounds, which suffer in addition from unnecessary complex dependencies on the model (like reward gaps). We instead study the more natural model incorporating direct dependencies in the hidden states (on top of dependencies on the observed contexts, as is natural for contextual bandits) and also obtain stronger, high-probability, regret bounds for a fully adaptive strategy that estimates HMM parameters online. These bounds do not depend on the reward functions and only depend on the model through the estimation of the HMM parameters.

artificial intelligence, machine learning, nelsonetal, (18 more...)

2604.08149

Country: Europe > France > Île-de-France > Paris > Paris (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)