AITopics

Scaling Data-Constrained Language Models

Neural Information Processing SystemsMar-27-2025, 15:29:07 GMT

The current trend of scaling language models involves increasing both parameter count and training dataset size. Extrapolating this trend suggests that training dataset size may soon be limited by the amount of text data available on the internet. Motivated by this limit, we investigate scaling language models in data-constrained regimes. Specifically, we run a large set of experiments varying the extent of data repetition and compute budget, ranging up to 900 billion training tokens and 9 billion parameter models. We find that with constrained data for a fixed compute budget, training with up to 4 epochs of repeated data yields negligible changes to loss compared to having unique data. However, with more repetition, the value of adding compute eventually decays to zero. We propose and empirically validate a scaling law for compute optimality that accounts for the decreasing value of repeated tokens and excess parameters. Finally, we experiment with approaches mitigating data scarcity, including augmenting the training dataset with code data or removing commonly used filters.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

Neural Information Processing Systems

Country: Europe (1.00)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)

Add feedback

dattri: A Library for Efficient Data Attribution Junwei Deng 1* Ting-Wei Li

Neural Information Processing SystemsMar-27-2025, 15:29:00 GMT

Data attribution methods aim to quantify the influence of individual training samples on the prediction of artificial intelligence (AI) models.

artificial intelligence, data attribution method, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States > Illinois (0.14)

Genre: Research Report > New Finding (0.68)

Industry: Leisure & Entertainment (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Sharpness-diversity tradeoff: improving flat ensembles with SharpBalance Haiquan Lu

Neural Information Processing SystemsMar-27-2025, 15:28:54 GMT

Recent studies on deep ensembles have identified the sharpness of the local minima of individual learners and the diversity of the ensemble members as key factors in improving test-time performance. Building on this, our study investigates the interplay between sharpness and diversity within deep ensembles, illustrating their crucial role in robust generalization to both in-distribution (ID) and out-of-distribution (OOD) data. We discover a trade-off between sharpness and diversity: minimizing the sharpness in the loss landscape tends to diminish the diversity of individual members within the ensemble, adversely affecting the ensemble's improvement. The trade-off is justified through our theoretical analysis and verified empirically through extensive experiments. To address the issue of reduced diversity, we introduce SharpBalance, a novel training approach that balances sharpness and diversity within ensembles. Theoretically, we show that our training strategy achieves a better sharpness-diversity trade-off. Empirically, we conducted comprehensive evaluations in various data sets (CIFAR-10, CIFAR-100, TinyImageNet) and showed that SharpBalance not only effectively improves the sharpness-diversity trade-off, but also significantly improves ensemble performance in ID and OOD scenarios. Our code has been made open-source.

artificial intelligence, ensemble, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Education (0.87)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

Get Rid of Isolation: A Continuous Multi-task Spatio-Temporal Learning Framework Zhongchao Yi

Neural Information Processing SystemsMar-27-2025, 15:28:43 GMT

Spatiotemporal learning has become a pivotal technique to enable urban intelligence. Traditional spatiotemporal models mostly focus on a specific task by assuming a same distribution between training and testing sets. However, given that urban systems are usually dynamic, multi-sourced with imbalanced data distributions, current specific task-specific models fail to generalize to new urban conditions and adapt to new domains without explicitly modeling interdependencies across various dimensions and types of urban data. To this end, we argue that there is an essential to propose a Continuous Multi-task Spatio-Temporal learning framework (CMuST) to empower collective urban intelligence, which reforms the urban spatiotemporal learning from single-domain to cooperatively multi-dimensional and multi-task learning. Specifically, CMuST proposes a new multi-dimensional spatiotemporal interaction network (MSTI) to allow cross-interactions between context and main observations as well as self-interactions within spatial and temporal aspects to be exposed, which is also the core for capturing task-level commonality and personalization. To ensure continuous task learning, a novel Rolling Adaptation training scheme (RoAda) is devised, which not only preserves task uniqueness by constructing data summarization-driven task prompts, but also harnesses correlated patterns among tasks by iterative model behavior modeling. We further establish a benchmark of three cities for multi-task spatiotemporal learning, and empirically demonstrate the superiority of CMuST via extensive evaluations on these datasets. The impressive improvements on both few-shot streaming data and new domain tasks against existing SOAT methods are achieved.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: Asia > China (0.46)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Transportation (0.93)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

c1f7b1ed763e9c75e4db74b49b76db5f-Paper-Conference.pdf

Neural Information Processing SystemsMar-27-2025, 15:28:31 GMT

Add feedback

c1e2faff6f588870935f114ebe04a3e5-Supplemental-Conference.pdf

Neural Information Processing SystemsMar-27-2025, 15:28:25 GMT

machine learning, natural language, technology education, (19 more...)

Neural Information Processing Systems

Industry: Education > Curriculum > Subject-Specific Education (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

c1d4798259250f2b4fe38614b48f8996-Supplemental-Conference.pdf

Neural Information Processing SystemsMar-27-2025, 15:28:24 GMT

artificial intelligence, machine learning, obstacle, (12 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.71)
Information Technology > Artificial Intelligence > Robots (0.49)

Add feedback

Training Compute-Optimal Large Language Models

Neural Information Processing SystemsMar-27-2025, 15:28:21 GMT

We investigate the optimal model size and number of tokens for training a Transformer language model under a given compute budget. We find that current large language models are significantly undertrained, a consequence of the recent focus on scaling language models whilst keeping the amount of training data constant. By training over 400 language models ranging from 70 million to over 16 billion parameters on 5 to 500 billion tokens, we find that for compute-optimal training, the model size and the number of training tokens should be scaled equally: for every doubling of model size the number of training tokens should also be doubled. We test this hypothesis by training a predicted compute-optimal model, Chinchilla, that uses the same compute budget as Gopher but with 70B parameters and 4 more more data. Chinchilla uniformly and significantly outperforms Gopher (280B), GPT-3 (175B), Jurassic-1 (178B), and Megatron-Turing NLG (530B) on a large range of downstream evaluation tasks. This also means that Chinchilla uses substantially less compute for fine-tuning and inference, greatly facilitating downstream usage. As a highlight, Chinchilla reaches a state-of-the-art average accuracy of 67.5% on the MMLU benchmark, greater than a 7% improvement over Gopher.

large language model, machine learning, model size, (18 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

c1d4798259250f2b4fe38614b48f8996-Paper-Conference.pdf

Neural Information Processing SystemsMar-27-2025, 15:28:14 GMT

artificial intelligence, machine learning, obstacle, (22 more...)

Neural Information Processing Systems

Country:

North America > United States (1.00)
Europe (1.00)
Asia (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)
Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

neurips_attack_recsys

haoyang

Neural Information Processing SystemsMar-27-2025, 15:28:02 GMT

A.1 Additional Motivational Observations A.1.1 Additional Results on Difficulty-agnostic Analysis Figure 1 shows additional results of TNA [11] and PAPU [51] on The results also verify the conclusion in the Sec 3.1.1: Also, we do not show the result of TrialAttack [45], since it satisfies the difficulty-property 1 that enables attackers to put more efforts on easy users (see Sec. 3.1.2). Additional Results on Diversity-agnostic Analysis Figure 1 shows additional results of TNA [11], PAPU [51], and TrialAttack [45] on ML-100K [14]. We follow the same experiment setting in Sec 3.2.1. As shown in Figure 1 (a) and (c), the fake users of TNA and PAPU form a cluster that is distributed in community 3 and community 1, respectively. Consequently, in Figure 1 (b) and (d), TNA and PAPU improve HR@50 on community 3 and 1, respectively, while keeping similar HR@50 on the other communities. Thus, they suffer from the diversity-deficit issue.

artificial intelligence, machine learning, user behavior, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.47)

Add feedback

Filters

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Scaling Data-Constrained Language Models

dattri: A Library for Efficient Data Attribution Junwei Deng 1* Ting-Wei Li

Sharpness-diversity tradeoff: improving flat ensembles with SharpBalance Haiquan Lu

Get Rid of Isolation: A Continuous Multi-task Spatio-Temporal Learning Framework Zhongchao Yi

c1f7b1ed763e9c75e4db74b49b76db5f-Paper-Conference.pdf

c1e2faff6f588870935f114ebe04a3e5-Supplemental-Conference.pdf

c1d4798259250f2b4fe38614b48f8996-Supplemental-Conference.pdf

Training Compute-Optimal Large Language Models

c1d4798259250f2b4fe38614b48f8996-Paper-Conference.pdf

neurips_attack_recsys