AITopics

2605.29152

Country: North America > United States (0.45)

Genre: Research Report > New Finding (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Dölz, Jürgen, Multerer, Michael, Palma, Michele

Beyond Lipschitz: Data-Driven Robustness via Discrete Modulus of Continuity

arXiv.org Machine LearningMay-28-2026

Robustness of neural networks is commonly quantified via local or global Lipschitz constants. However, Lipschitz continuity can be overly coarse or overly restrictive as global robustness measure, failing to capture nuanced, data-dependent behavior. We propose a data-driven, architecture-agnostic framework based on the discrete modulus of continuity (DMOC), a non linear generalization of Lipschitz continuity that provides a finer notion of robustness. Unlike many existing approaches, DMOC does not require access to model internals and instead evaluates regularity relative to the data distribution. This shifts the focus from the model to the data, which provide a data-driven baseline of regularity against which the network's robustness is assessed. We establish convergence results for DMOC-induced seminorms with explicit data-driven rates in terms of the separation distance, and introduce a scalable minibatch algorithm that reduces the quadratic cost of exact computation, enabling application to large-scale data sets such as ImageNet. Empirically, DMOC serves as an architecture independent diagnostic: it distinguishes trained from untrained networks, reveals underfitting and overfitting regimes, and yields, as a special case, tight Lipschitz estimates comparable to state-of-the-art method such as ECLipsE and ECLipsE-fast.

artificial intelligence, dmoc, machine learning, (17 more...)

2605.28729

Country:

North America > Canada (0.69)
Europe (0.68)
North America > United States > California (0.16)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Machine LearningMay-22-2026

Scalable On-Policy Reinforcement Learning via Adaptive Batch Scaling

Park, Jongchan

Conventional wisdom holds that large-batch training is fundamentally incompatible with Reinforcement Learning (RL) - beyond a modest threshold, increasing batch sizes typically yields diminishing returns or performance degradation due to the inherent non-stationarity of the data distribution. We challenge this view by observing that non-stationarity is not a fixed property of RL, but evolves throughout training: early stages exhibit rapid behavioral shifts that demand small batches for plasticity, whereas late stages approach a quasi-stationary regime where large batches enable precise convergence. Motivated by this observation, we propose Adaptive Batch Scaling (ABS), that dynamically adjusts the effective batch size according to the stability of the learning policy. Central to ABS is Behavioral Divergence, a novel metric that quantifies policy non-stationarity by measuring action-level shifts between consecutive updates, which we use to scale batch size inversely to policy volatility. Integrated with the Parallelised Q-Network (PQN) algorithm and evaluated on the ALE benchmark, ABS seamlessly reconciles early-stage plasticity with late-stage stable convergence. Strikingly, contrary to conventional wisdom, our results reveal that the combination of larger networks and larger batch sizes achieves the best performance - a scaling behavior previously thought to be unattainable in RL, now unlocked through adaptive batch control.

artificial intelligence, machine learning, natural language, (11 more...)

2605.21557

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Liu, Jiachang, Lodi, Andrea

From Sequential Nodes to GPU Batches: Parallel Branch and Bound for Optimal $k$-Sparse GLMs

arXiv.org Machine LearningMay-22-2026

GPUs have significantly accelerated first-order methods for large-scale optimization, especially in continuous optimization. However, this success has not transferred cleanly to problems with discrete variables, combinatorial structure, and nonlinear objectives, such as certifying optimal solutions for cardinality-constrained generalized linear models. Major challenges include the sequential processing of heterogeneous nodes in branch and bound (BnB) and frequent data movement between the CPU and GPU. We propose a simple, generic, and modular CPU--GPU framework that processes multiple BnB nodes in batches on GPUs. The framework is built around a small set of GPU-efficient routines and uses padding together with lightweight custom kernels to handle irregular node data structures. Experiments show one to two orders of magnitude speedups and zero optimality gap on challenging instances. The framework can also be extended to collect the entire Rashomon set, enabling downstream statistical analysis such as variable-importance analysis and model selection under secondary user-specific measures (e.g., AUC in classification).

artificial intelligence, machine learning, regression, (15 more...)

2605.22188

Country: North America > United States (0.28)

Genre: Research Report (0.53)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Liu, Andy Zeyi, Paquette, Elliot, Sous, John

Spectral Lens: Activation and Gradient Spectra as Diagnostics of LLM Optimization

arXiv.org Machine LearningMay-8-2026

Training loss and throughput can hide distinct internal representation in language-model training. To examine these hidden mechanics, we use spectral measurements as practical and operational diagnostics. Using a controlled family of decoder-only models adapted from the modded NanoGPT codebase, we introduce an empirical protocol based on activation covariance and per-sample gradient SVD spectra. This dual-view reveals three empirical findings and one mechanistic explanation. First, batch size acts as a latent determinant of representation geometry: runs that reach equal loss settle into systematically distinct activation spectra. Second, the activation covariance tail measured early in training reliably forecasts downstream token efficiency. Third, movement of the activation spectrum head (leading modes), together with gradient spectra, characterizes underlying learning-dynamics changes, separating learning-side architectural improvements from primarily execution-side gains. These predictive and diagnostic signals persist across the 12-, 36-, and 48-layer model tiers. Finally, a mechanistic model proves the main observations and explains how activation covariance spectra correlate with task-aligned feature learning.

large language model, machine learning, natural language, (19 more...)

2605.05683

Country: North America > United States > Connecticut > New Haven County (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Tarun Kathuria, Amit Deshpande, Pushmeet Kohli

Batched Gaussian Process Bandit Optimization via Determinantal Point Processes

Neural Information Processing SystemsMay-1-2026, 06:16:17 GMT

Gaussian Process bandit optimization has emerged as a powerful tool for optimizing noisy black box functions. One example in machine learning is hyper-parameter optimization where each evaluation of the target function may require training a model which may involve days or even weeks of computation. Most methods for this so-called "Bayesian optimization" only allow sequential exploration of the parameter space. However, it is often desirable to propose batches or sets of parameter values to explore simultaneously, especially when there are large parallel processing facilities at our disposal. Batch methods require modeling the interaction between the different evaluations in the batch, which can be expensive in complex scenarios. In this paper, we propose a new approach for parallelizing Bayesian optimization by modeling the diversity of a batch via Determinantal point processes (DPPs) whose kernels are learned automatically. This allows us to generalize a previous result as well as prove better regret bounds based on DPP sampling. Our experiments on a variety of synthetic and real-world robotics and hyper-parameter optimization tasks indicate that our DPP-based methods, especially those based on DPP sampling, outperform state-of-the-art methods.

algorithm, artificial intelligence, machine learning, (16 more...)

Genre: Research Report (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Neural Information Processing SystemsMay-1-2026, 04:47:27 GMT

Supplementary Document for HA-ViD: AHuman Assembly Video Dataset for Comprehensive Assembly Knowledge Understanding

Different from general assembly datasets, we treat assemblable features, such as holes, stud and USB female, as objects, to enable finer-grained assembly knowledge understanding.

artificial intelligence, dataset, machine learning, (13 more...)

Genre: Research Report (0.68)

Industry:

Law (0.93)
Government (0.87)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Robots (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Neural Information Processing SystemsApr-30-2026, 09:16:09 GMT

Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception

IMP makes use of a novel design that combines Alternating Gradient Descent (AGD) and Mixture-of-Experts (MoE) for efficient model & task scaling. We conduct extensive empirical studies and reveal the following key insights: 1) performing gradient descent updates by alternating on diverse modalities, loss functions, and tasks, with varying input resolutions, efficiently improves the model.

artificial intelligence, machine learning, objective, (15 more...)

Genre: Research Report (0.46)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.81)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Neural Information Processing SystemsApr-30-2026, 02:28:50 GMT

ALimitations and Societal

Limitations One limitation of our model is its potential for data bias. KOSMOS-1 is trained on a2 web-scale multimodal corpus, which means that it is likely to be biased towards the data that it was3 trained on. This could lead to the model generating text that is biased towards certain demographics4 or viewpoints.5 Another limitation of KOSMOS-1 is its relatively small size compared to other large language models.6 This means that the model may not be able to learn as complex relationships between different7 modalities. This could lead to the model making mistakes when it is asked to perform tasks that8 require a deep understanding of multiple modalities.9 Finally, KOSMOS-1 only supports vision modality.

large language model, machine learning, natural language, (17 more...)

Country: North America > United States > Minnesota (0.28)

Industry:

Leisure & Entertainment (1.00)
Media > Film (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)

Neural Information Processing SystemsApr-30-2026, 02:27:37 GMT

Supplementary Material

All code can be downloaded from https://github.com/Shanka123/OCRA, Figure task is to S1: say Abstract whether Reasoning they are the T same asks (AR or dif T). ferent. Same/differ Relational ent: matc Two h-to-sample: objects are presented, A source and pair the of objects is presented that either instantiates a'same' or'different' relation, and the task is to select the pair in a 2 of tar 2 get array objects format, (out with of tw the o pairs) source th pair at instantiates presented in the the same top relation. The of task is to select the missing object from a set of four choices. Problems were presented in a 2 3 array each answer format, choice, with one see of Figure the answer S8). Identity choices rules: inserted An into abstract the bottom pattern right is instantiated cell (separate in the images first ro for w (AB instantiated A, ABB, in or the AAA), second and ro the w.

artificial intelligence, epoch, machine learning, (18 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)