AITopics | Bern

We study the role of batch size in stochastic conditional gradient methods under a $μ$-Kurdyka-Łojasiewicz ($μ$-KL) condition. Focusing on momentum-based stochastic conditional gradient algorithms (e.g., Scion), we derive a new analysis that explicitly captures the interaction between stepsize, batch size, and stochastic noise. Our study reveals a regime-dependent behavior: increasing the batch size initially improves optimization accuracy but, beyond a critical threshold, the benefits saturate and can eventually degrade performance under a fixed token budget. Notably, the theory predicts the magnitude of the optimal stepsize and aligns well with empirical practices observed in large-scale training. Leveraging these insights, we derive principled guidelines for selecting the batch size and stepsize, and propose an adaptive strategy that increases batch size and sequence length during training while preserving convergence guarantees. Experiments on NanoGPT are consistent with the theoretical predictions and illustrate the emergence of the predicted scaling regimes. Overall, our results provide a theoretical framework for understanding batch size scaling in stochastic conditional gradient methods and offer guidance for designing efficient training schedules in large-scale optimization.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Machine Learning

2603.21191

Country:

North America > United States (0.14)
Asia > Middle East > Jordan (0.04)
Asia > Middle East > UAE (0.04)
(6 more...)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

a1e0d6fa0c30b7d4f75dd9c7ed6189f2-Paper-Conference.pdf

Neural Information Processing SystemsFeb-17-2026, 02:21:55 GMT

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

Europe > Ukraine > Kyiv Oblast > Kyiv (0.14)
Europe > Austria > Vienna (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
(96 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Education > Health & Safety > School Nutrition (0.93)
Health & Medicine > Consumer Health (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.52)

Add feedback

Recurrent Registration Neural Networks for Deformable Image Registration

Robin Sandkühler, Simon Andermatt, Grzegorz Bauman, Sylvia Nyilas, Christoph Jud, Philippe C. Cattin

Neural Information Processing SystemsFeb-14-2026, 15:03:45 GMT

Neural Information Processing Systems http://nips.cc/

image registration, registration, transformation, (15 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Basel-City > Basel (0.05)
Europe > Switzerland > Bern > Bern (0.04)
North America > Canada (0.04)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (0.96)
Health & Medicine > Therapeutic Area (0.69)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Dendritic cortical microcircuits approximate the backpropagation algorithm

João Sacramento, Rui Ponte Costa, Yoshua Bengio, Walter Senn

Neural Information Processing SystemsFeb-12-2026, 09:21:11 GMT

Neural Information Processing Systems http://nips.cc/

interneuron, neuron, plasticity, (15 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > Switzerland > Bern > Bern (0.04)
(4 more...)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Backpropagation (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Estimatingtheintrinsicdimensionalityusing NormalizingFlows

Neural Information Processing SystemsFeb-8-2026, 21:44:57 GMT

Therefore, representation learning is a very active area of research [33] with a wide range of applications ranging from neuroscience [27], molecular biology [28], bioinformatics [12]or image analysis [21].

artificial intelligence, machine learning, manifold, (16 more...)

Neural Information Processing Systems

Country: Europe > Switzerland > Bern > Bern (0.04)

Industry: Health & Medicine (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback