AITopics | cooldown

Collaborating Authors

cooldown

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

8b970e15a89bf5d12542810df8eae8fc-Paper-Conference.pdf

Neural Information Processing SystemsFeb-16-2026, 12:09:58 GMT

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Italy > Tuscany > Florence (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

8b970e15a89bf5d12542810df8eae8fc-Paper-Conference.pdf

Neural Information Processing SystemsOct-10-2025, 08:53:39 GMT

cooldown, experiment, language model, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Italy > Tuscany > Florence (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

The Surprising Agreement Between Convex Optimization Theory and Learning-Rate Scheduling for Large Model Training

Schaipp, Fabian, Hägele, Alexander, Taylor, Adrien, Simsekli, Umut, Bach, Francis

arXiv.org Machine LearningJan-31-2025

We show that learning-rate schedules for large model training behave surprisingly similar to a performance bound from non-smooth convex optimization theory. We provide a bound for the constant schedule with linear cooldown; in particular, the practical benefit of cooldown is reflected in the bound due to the absence of logarithmic terms. Further, we show that this surprisingly close match between optimization theory and practice can be exploited for learning-rate tuning: we achieve noticeable improvements for training 124M and 210M Llama-type models by (i) extending the schedule for continued training with optimal learning-rate, and (ii) transferring the optimal learning-rate across schedules.

artificial intelligence, cooldown, machine learning, (16 more...)

arXiv.org Machine Learning

2501.18965

Country:

North America > United States > New York (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Metadata Conditioning Accelerates Language Model Pre-training

Gao, Tianyu, Wettig, Alexander, He, Luxi, Dong, Yihe, Malladi, Sadhika, Chen, Danqi

arXiv.org Artificial IntelligenceJan-3-2025

The vast diversity of styles, domains, and quality levels present in language model pre-training corpora is essential in developing general model capabilities, but efficiently learning and deploying the correct behaviors exemplified in each of these heterogeneous data sources is challenging. To address this, we propose a new method, termed Metadata Conditioning then Cooldown (MeCo), to incorporate additional learning cues during pre-training. MeCo first provides metadata (e.g., URLs like en.wikipedia.org) alongside the text during training and later uses a cooldown phase with only the standard text, thereby enabling the model to function normally even without metadata. MeCo significantly accelerates pre-training across different model scales (600M to 8B parameters) and training sources (C4, RefinedWeb, and DCLM). For instance, a 1.6B language model trained with MeCo matches the downstream task performance of standard pre-training while using 33% less data. Additionally, MeCo enables us to steer language models by conditioning the inference prompt on either real or fabricated metadata that encodes the desired properties of the output: for example, prepending wikipedia.org to reduce harmful generations or factquizmaster.com (fabricated) to improve common knowledge task performance. We also demonstrate that MeCo is compatible with different types of metadata, such as model-generated topics. MeCo is remarkably simple, adds no computational overhead, and demonstrates promise in producing more capable and steerable language models.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2501.01956

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)

Add feedback

To Code, or Not To Code? Exploring Impact of Code in Pre-training

Aryabumi, Viraat, Su, Yixuan, Ma, Raymond, Morisot, Adrien, Zhang, Ivan, Locatelli, Acyr, Fadaee, Marzieh, Üstün, Ahmet, Hooker, Sara

arXiv.org Artificial IntelligenceAug-20-2024

Including code in the pre-training data mixture, even for models not specifically designed for code, has become a common practice in LLMs pre-training. While there has been anecdotal consensus among practitioners that code data plays a vital role in general LLMs' performance, there is only limited work analyzing the precise impact of code on non-code tasks. In this work, we systematically investigate the impact of code data on general performance. We ask "what is the impact of code data used in pre-training on a large variety of downstream tasks beyond code generation". We conduct extensive ablations and evaluate across a broad range of natural language reasoning tasks, world knowledge tasks, code benchmarks, and LLM-as-a-judge win-rates for models with sizes ranging from 470M to 2.8B parameters. Across settings, we find a consistent results that code is a critical building block for generalization far beyond coding tasks and improvements to code quality have an outsized impact across all tasks. In particular, compared to text-only pre-training, the addition of code results in up to relative increase of 8.2% in natural language (NL) reasoning, 4.2% in world knowledge, 6.6% improvement in generative win-rates, and a 12x boost in code performance respectively. Our work suggests investments in code quality and preserving code during pre-training have positive impacts.

code data, cooldown, wang, (14 more...)

arXiv.org Artificial Intelligence

2408.10914

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Middle East > Jordan (0.04)
North America > Canada > Ontario > Toronto (0.04)
(12 more...)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations

Hägele, Alexander, Bakouch, Elie, Kosson, Atli, Allal, Loubna Ben, Von Werra, Leandro, Jaggi, Martin

arXiv.org Artificial IntelligenceMay-29-2024

Scale has become a main ingredient in obtaining strong machine learning models. As a result, understanding a model's scaling properties is key to effectively designing both the right training setup as well as future generations of architectures. In this work, we argue that scale and training research has been needlessly complex due to reliance on the cosine schedule, which prevents training across different lengths for the same model size. We investigate the training behavior of a direct alternative -- constant learning rate and cooldowns -- and find that it scales predictably and reliably similar to cosine. Additionally, we show that stochastic weight averaging yields improved performance along the training trajectory, without additional training costs, across different scales. Importantly, with these findings we demonstrate that scaling experiments can be performed with significantly reduced compute and GPU hours by utilizing fewer but reusable training runs.

cooldown, experiment, language model, (12 more...)

arXiv.org Artificial Intelligence

2405.18392

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.96)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)

Add feedback

[N] OpenAI bot beat best Dota 2 players in 1v1 at The International 2017 • r/MachineLearning

@machinelearnbotAug-12-2017, 02:10:08 GMT

Ok, I know a bit about dota (been playing it for 8 years now). I will try my best to put this into perspective. What: It beat players that many considered to be the absolute best at dota. The environment: 2 players move along a lane with the goal of destroying the other's defensive structure or killing the player 2 times for victory. Every 30 seconds weak npc minions enter the lane attack each other and players.

large language model, machine learning, natural language, (12 more...)

@machinelearnbot

Industry:

Leisure & Entertainment > Games (0.56)
Media > News (0.40)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.40)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.40)

Add feedback

MazeBase: A Sandbox for Learning from Games

Sukhbaatar, Sainbayar, Szlam, Arthur, Synnaeve, Gabriel, Chintala, Soumith, Fergus, Rob

arXiv.org Artificial IntelligenceJan-7-2016

This paper introduces MazeBase: an environment for simple 2D games, designed as a sandbox for machine learning approaches to reasoning and planning. Within it, we create 10 simple games embodying a range of algorithmic tasks (e.g. if-then statements or set negation). A variety of neural models (fully connected, convolutional network, memory network) are deployed via reinforcement learning on these games, with and without a procedurally generated curriculum. Despite the tasks' simplicity, the performance of the models is far from optimal, suggesting directions for future development. We also demonstrate the versatility of MazeBase by using it to emulate small combat scenarios from StarCraft. Models trained on the MazeBase version can be directly applied to StarCraft, where they consistently beat the in-game AI.

agent, artificial intelligence, machine learning, (19 more...)

arXiv.org Artificial Intelligence

1511.07401

Country: North America > United States > New York (0.04)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback