AITopics | pre-training distribution

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: Europe > Poland (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

On the Connection between Pre-training Data Diversity and Fine-tuning Robustness

Neural Information Processing SystemsDec-26-2025, 20:39:13 GMT

Pre-training has been widely adopted in deep learning to improve model performance, especially when the training data for a target task is limited. In our work, we seek to understand the implications of this training strategy on the generalization properties of downstream models. More specifically, we ask the following question: how do properties of the pre-training distribution affect the robustness of a fine-tuned model? The properties we explore include the label space, label semantics, image diversity, data domains, and data quantity of the pre-training distribution. We find that the primary factor influencing downstream effective robustness (Taori et al., 2020) is data quantity, while other factors have limited significance. For example, reducing the number of ImageNet pre-training classes by 4x while increasing the number of images per class by 4x (that is, keeping total data quantity fixed) does not impact the robustness of fine-tuned models. We demonstrate our findings on pre-training distributions drawn from various natural and synthetic data sources, primarily using the iWildCam-WILDS distribution shift as a test for robustness.

data diversity and fine-tuning robustness, pre-training data diversity, pre-training distribution, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Granularity__final

Thao Nguyen

Neural Information Processing SystemsOct-9-2025, 08:09:48 GMT

Pre-training has been widely adopted in deep learning to improve model performance, especially when the training data for a target task is limited.

artificial intelligence, machine learning, robustness, (18 more...)

Neural Information Processing Systems

Country: Europe > Poland (0.04)

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Analyzing Generalization in Pre-Trained Symbolic Regression

Voigt, Henrik, Kahlmeyer, Paul, Lawonn, Kai, Habeck, Michael, Giesen, Joachim

arXiv.org Artificial IntelligenceSep-25-2025

Symbolic regression algorithms search a space of mathematical expressions for formulas that explain given data. Transformer-based models have emerged as a promising, scalable approach shifting the expensive combinatorial search to a large-scale pre-training phase. However, the success of these models is critically dependent on their pre-training data. Their ability to generalize to problems outside of this pre-training distribution remains largely unexplored. In this work, we conduct a systematic empirical study to evaluate the generalization capabilities of pre-trained, transformer-based symbolic regression. We rigorously test performance both within the pre-training distribution and on a series of out-of-distribution challenges for several state of the art approaches. Our findings reveal a significant dichotomy: while pre-trained models perform well in-distribution, the performance consistently degrades in out-of-distribution scenarios. We conclude that this generalization gap is a critical barrier for practitioners, as it severely limits the practical use of pre-trained approaches for real-world applications.

large language model, machine learning, symbolic regression, (19 more...)

arXiv.org Artificial Intelligence

2509.19849

Genre: Research Report > New Finding (0.88)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

On the Connection between Pre-training Data Diversity and Fine-tuning Robustness

Neural Information Processing SystemsFeb-11-2025, 12:30:48 GMT

Pre-training has been widely adopted in deep learning to improve model performance, especially when the training data for a target task is limited. In our work, we seek to understand the implications of this training strategy on the generalization properties of downstream models. More specifically, we ask the following question: how do properties of the pre-training distribution affect the robustness of a fine-tuned model? The properties we explore include the label space, label semantics, image diversity, data domains, and data quantity of the pre-training distribution. We find that the primary factor influencing downstream effective robustness (Taori et al., 2020) is data quantity, while other factors have limited significance.

data diversity and fine-tuning robustness, pre-training data diversity, pre-training distribution, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Active Fine-Tuning of Generalist Policies

Bagatella, Marco, Hübotter, Jonas, Martius, Georg, Krause, Andreas

arXiv.org Artificial IntelligenceOct-7-2024

Pre-trained generalist policies are rapidly gaining relevance in robot learning due to their promise of fast adaptation to novel, in-domain tasks. This adaptation often relies on collecting new demonstrations for a specific task of interest and applying imitation learning algorithms, such as behavioral cloning. However, as soon as several tasks need to be learned, we must decide which tasks should be demonstrated and how often? We study this multi-task problem and explore an interactive framework in which the agent adaptively selects the tasks to be demonstrated. We propose AMF (Active Multi-task Fine-tuning), an algorithm to maximize multi-task policy performance under a limited demonstration budget by collecting demonstrations yielding the largest information gain on the expert policy. We derive performance guarantees for AMF under regularity assumptions and demonstrate its empirical effectiveness to efficiently fine-tune neural policies in complex and high-dimensional environments.

demonstration, learning, trajectory, (16 more...)

arXiv.org Artificial Intelligence

2410.05026

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > Canada > Alberta (0.14)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
(3 more...)

Genre: Research Report (0.64)

Industry:

Education (0.67)
Government > Regional Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.67)

Add feedback

On the Connection between Pre-training Data Diversity and Fine-tuning Robustness

Ramanujan, Vivek, Nguyen, Thao, Oh, Sewoong, Schmidt, Ludwig, Farhadi, Ali

arXiv.org Artificial IntelligenceJul-24-2023

Pre-training has been widely adopted in deep learning to improve model performance, especially when the training data for a target task is limited. In our work, we seek to understand the implications of this training strategy on the generalization properties of downstream models. More specifically, we ask the following question: how do properties of the pre-training distribution affect the robustness of a fine-tuned model? The properties we explore include the label space, label semantics, image diversity, data domains, and data quantity of the pre-training distribution. We find that the primary factor influencing downstream effective robustness (Taori et al., 2020) is data quantity, while other factors have limited significance. For example, reducing the number of ImageNet pre-training classes by 4x while increasing the number of images per class by 4x (that is, keeping total data quantity fixed) does not impact the robustness of fine-tuned models. We demonstrate our findings on pre-training distributions drawn from various natural and synthetic data sources, primarily using the iWildCam-WILDS distribution shift as a test for downstream robustness.

artificial intelligence, machine learning, robustness, (18 more...)

arXiv.org Artificial Intelligence

2307.12532

Country: Europe > Poland (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Transportation (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Filters

Collaborating Authors

pre-training distribution

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Granularity__final

On the Connection between Pre-training Data Diversity and Fine-tuning Robustness

Granularity__final

Analyzing Generalization in Pre-Trained Symbolic Regression

On the Connection between Pre-training Data Diversity and Fine-tuning Robustness

Active Fine-Tuning of Generalist Policies

On the Connection between Pre-training Data Diversity and Fine-tuning Robustness