AITopics | cbs

Critical Batch Size Revisited: ASimple Empirical Approach to Large-Batch Language Model Training

Neural Information Processing SystemsJun-21-2026, 12:41:35 GMT

The right batch size is important when training language models at scale: a large batch size is necessary for fast training, but a batch size that is too large will harm token efficiency. To navigate this tradeoff, McCandlish et al. (2018) suggest that a critical batch size (CBS), below which training will not substantially degrade loss, can be estimated based on the gradient noise scale during training. While their method has been adopted in practice, e.g., when training GPT-3, strong assumptions are required to justify gradient noise as a proxy for the CBS, which makes it unclear whether their approach should be trusted in practice, limiting its applicability. In this paper, we introduce a simple, empirical approach to directly measure the CBS and show how the CBS evolves over training. Applying our approach to the OLMo models, we find that CBS is near 0 at initialization, increases rapidly at first, and then plateaus as training progresses. Furthermore, we find that this trend holds across different model sizes (1B and 7B), suggesting CBS from small training runs can inform larger-scale training runs. Our findings about how the CBS changes over training motivate batch size warmup as a natural way to reliably train language models at large batch size: start the batch size small and increase it as the CBS grows. To validate this claim, we use batch size warmup to train OLMo 1B to slightly better loss than the original training run with 43% fewer gradient steps. This shows how our framework can be applied to reliably train language models at larger batch sizes, increasing data parallelism without compromising performance.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

Asia (0.67)
North America > United States > Minnesota (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.66)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.46)

Add feedback

Critical Batch Size Revisited: A Simple Empirical Approach to Large-Batch Language Model Training

Neural Information Processing SystemsJun-13-2026, 18:25:26 GMT

The right batch size is important when training language models at scale: a large batch size is necessary for fast training, but a batch size that is will harm token efficiency. To navigate this tradeoff, McCandlish et al. (2018) suggest that a (CBS), below which training will not substantially degrade loss, can be estimated based on the gradient noise scale during training. While their method has been adopted in practice, e.g., when training GPT-3, strong assumptions are required to justify gradient noise as a proxy for the CBS, which makes it unclear whether their approach should be trusted in practice, limiting its applicability. In this paper, we introduce a simple, empirical approach to measure the CBS and show how the CBS evolves over training. Applying our approach to the OLMo models, we find that CBS is near 0 at initialization, increases rapidly at first, and then plateaus as training progresses. Furthermore, we find that this trend holds across different model sizes (1B and 7B), suggesting CBS from small training runs can inform larger-scale training runs. Our findings about how the CBS changes over training motivate as a natural way to reliably train language models at large batch size: start the batch size small and increase it as the CBS grows. To validate this claim, we use batch size warmup to train OLMo 1B to slightly better loss than the original training run with 43% fewer gradient steps. This shows how our framework can be applied to reliably train language models at larger batch sizes, increasing data parallelism without compromising performance.

artificial intelligence, batch size, natural language, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

f6a673f09493afcd8b129a0bcf1cd5bc-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-11-2026, 03:50:34 GMT

experiment, kernel, learning, (14 more...)

Neural Information Processing Systems

Genre: Summary/Review (0.41)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

445e1050156c6ae8c082a8422bb7dfc0-Paper.pdf

Neural Information Processing SystemsFeb-8-2026, 06:06:53 GMT

electrode, neuron, selection, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Virginia (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
North America > Canada (0.04)
(2 more...)

Genre:

Research Report > New Finding (0.48)
Research Report > Experimental Study (0.30)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Cognitive Science (0.68)

Add feedback

NBC anchor Savannah Guthrie's mother has been abducted, sheriff suspects

BBC NewsFeb-3-2026, 01:18:47 GMT

NBC anchor Savannah Guthrie's mother has been abducted, sheriff suspects The mother of US news anchor Savannah Guthrie has been abducted and didn't go willingly from her home, Arizona law enforcement officials suspect. Nancy Guthrie, the 84-year-old mother of the NBC News host, was last seen in her house outside Tucson, Arizona, on Saturday evening. Her family reported her missing a day later. When authorities arrived, the scene of Nancy Guthrie's property caused grave concern, Pima County Sheriff Chris Nanos said. He did not provide a possible motive and, while there was no initial indication Nancy Guthrie could have been targeted because of her name, the sheriff said we can't dismiss that. I believe she was abducted, yes, Sheriff Nanos told CBS, the BBC's US partner.

artificial intelligence, guthrie, savannah guthrie, (10 more...)

BBC News

Country: North America > United States > Arizona > Pima County > Tucson (0.25)

Industry:

Media (1.00)
Leisure & Entertainment (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Government > Regional Government > North America Government > United States Government (0.36)

Technology: Information Technology > Artificial Intelligence (0.31)

Add feedback

10 media moments and controversies that defined 2025

FOX NewsDec-31-2025, 14:00:22 GMT

This material may not be published, broadcast, rewritten, or redistributed. Quotes displayed in real-time or delayed by at least 15 minutes. Market data provided by Factset . Powered and implemented by FactSet Digital Solutions . Mutual Fund and ETF data provided by Refinitiv Lipper . Trace Gallagher: This year's resolution is for the'naughty nightly news' Chicago mayor endorses'Abolish ICE' snowplow name NYT writer downplays MN fraud scandal investigation from'politicized' DOJ CBS News correspondent claims Supreme Court corruption narrative is'patently false' Sanders rails against AI, says'science-fiction fear' of it running the world not an outrageous idea Pelosi says she didn't intend to tear up Trump's 2020 State of the Union speech MS NOW guest praises Trump's'unconventional' approach to foreign policy (1) LA Mayor Karen Bass says it's'sad' to see Latinos joining the Border Patrol Santa is'PACKING HEAT' during a traffic stop Joe Rogan roasts'crazy' White House plaques installed by Trump Jimmy Kimmel criticized for'ridiculous' Christmas message Jimmy Kimmel jabs at Trump on Christmas: 'Tyranny is booming' CBS News defends pulling '60 Minutes' story'Jesus Crown of Thorns' season 2 is available to watch now on Fox Nation Kimmel says'tyranny is booming' under Trump in UK Christmas message Sunday Morning Futures anchor Maria Bartiromo looks back at her 2025 interviews with President Donald Trump as he laid out his agenda on the border, the economy, energy and foreign policy heading into 2026. NEW You can now listen to Fox News articles!

kimmel, kirk, trump, (14 more...)

FOX News

Country:

North America > United States > Illinois > Cook County > Chicago (0.24)
North America > Mexico (0.14)
Atlantic Ocean > Gulf of Mexico > United States Gulf of Mexico (0.04)
(4 more...)

Industry:

Media > Television (1.00)
Media > News (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence (0.87)

Add feedback

An Analysis of Constraint-Based Multi-Agent Pathfinding Algorithms

Lee, Hannah, Motes, James D., Morales, Marco, Amato, Nancy M.

arXiv.org Artificial IntelligenceNov-25-2025

This study informs the design of future multi-agent pathfinding (MAPF) and multi-robot motion planning (MRMP) algorithms by guiding choices based on constraint classification for constraint-based search algorithms. We categorize constraints as conservative or aggressive and provide insights into their search behavior, focusing specifically on vanilla Conflict-Based Search (CBS) and Conflict-Based Search with Priorities (CBSw/P). Under a hybrid grid-roadmap representation with varying resolution, we observe that aggressive (priority constraint) formulations tend to solve more instances as agent count or resolution increases, whereas conservative (motion constraint) formulations yield stronger solution quality when both succeed. Findings are synthesized in a decision flowchart, aiding users in selecting suitable constraints. Recommendations extend to Multi-Robot Motion Planning (MRMP), emphasizing the importance of considering topological features alongside problem, solution, and representation features. A comprehensive exploration of the study, including raw data and map performance, is available in our public GitHub Repository: https://GitHub.com/hannahjmlee/constraint-mapf-analysis

artificial intelligence, constraint, constraint-based reasoning, (17 more...)

arXiv.org Artificial Intelligence

2511.18604

Country: North America > United States > Illinois (0.28)

Genre:

Research Report > New Finding (1.00)
Overview (0.93)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

Critical Batch Size Revisited: A Simple Empirical Approach to Large-Batch Language Model Training

Merrill, William, Arora, Shane, Groeneveld, Dirk, Hajishirzi, Hannaneh

arXiv.org Artificial IntelligenceNov-7-2025

The right batch size is important when training language models at scale: a large batch size is necessary for fast training, but a batch size that is too large will harm token efficiency. To navigate this tradeoff, McCandlish et al. (2018) suggest that a critical batch size (CBS), below which training will not substantially degrade loss, can be estimated based on the gradient noise scale during training. While their method has been adopted in practice, e.g., when training GPT-3, strong assumptions are required to justify gradient noise as a proxy for the CBS, which makes it unclear whether their approach should be trusted in practice, limiting its applicability. In this paper, we introduce a simple, empirical approach to directly measure the CBS and show how the CBS evolves over training. Applying our approach to the OLMo models, we find that CBS is near 0 at initialization, increases rapidly at first, and then plateaus as training progresses. Furthermore, we find that this trend holds across different model sizes (1B and 7B), suggesting CBS from small training runs can inform larger-scale training runs. Our findings about how the CBS changes over training motivate batch size warmup as a natural way to reliably train language models at large batch size: start the batch size small and increase it as the CBS grows. To validate this claim, we use batch size warmup to train OLMo 1B to slightly better loss than the original training run with 43% fewer gradient steps. This shows how our framework can be applied to reliably train language models at larger batch sizes, increasing data parallelism without compromising performance.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2505.23971

Country:

Asia (0.67)
North America > United States > Minnesota (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.66)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.46)

Add feedback

445e1050156c6ae8c082a8422bb7dfc0-Paper.pdf

Neural Information Processing SystemsOct-2-2025, 19:13:21 GMT

artificial intelligence, machine learning, neuron, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.68)

Genre:

Research Report > New Finding (0.48)
Research Report > Experimental Study (0.30)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Cognitive Science (0.68)

Add feedback

Repulsive Trajectory Modification and Conflict Resolution for Efficient Multi-Manipulator Motion Planning

Hong, Junhwa, Lee, Beomjoon, Lee, Woojin, Nam, Changjoo

arXiv.org Artificial IntelligenceSep-18-2025

We propose an efficient motion planning method designed to efficiently find collision-free trajectories for multiple manipulators. While multi-manipulator systems offer significant advantages, coordinating their motions is computationally challenging owing to the high dimensionality of their composite configuration space. Conflict-Based Search (CBS) addresses this by decoupling motion planning, but suffers from subsequent conflicts incurred by resolving existing conflicts, leading to an exponentially growing constraint tree of CBS. Our proposed method is based on repulsive trajectory modification within the two-level structure of CBS. Unlike conventional CBS variants, the low-level planner applies a gradient descent approach using an Artificial Potential Field. This field generates repulsive forces that guide the trajectory of the conflicting manipulator away from those of other robots. As a result, subsequent conflicts are less likely to occur. Additionally, we develop a strategy that, under a specific condition, directly attempts to find a conflict-free solution in a single step without growing the constraint tree. Through extensive tests including physical robot experiments, we demonstrate that our method consistently reduces the number of expanded nodes in the constraint tree, achieves a higher success rate, and finds a solution faster compared to Enhanced CBS and other state-of-the-art algorithms.

artificial intelligence, conflict, trajectory, (17 more...)

arXiv.org Artificial Intelligence

2509.13882

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (0.91)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.68)

Add feedback

Filters

Collaborating Authors

cbs

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Critical Batch Size Revisited: ASimple Empirical Approach to Large-Batch Language Model Training

Critical Batch Size Revisited: A Simple Empirical Approach to Large-Batch Language Model Training

f6a673f09493afcd8b129a0bcf1cd5bc-AuthorFeedback.pdf

445e1050156c6ae8c082a8422bb7dfc0-Paper.pdf

NBC anchor Savannah Guthrie's mother has been abducted, sheriff suspects

10 media moments and controversies that defined 2025

An Analysis of Constraint-Based Multi-Agent Pathfinding Algorithms

Critical Batch Size Revisited: A Simple Empirical Approach to Large-Batch Language Model Training

445e1050156c6ae8c082a8422bb7dfc0-Paper.pdf

Repulsive Trajectory Modification and Conflict Resolution for Efficient Multi-Manipulator Motion Planning