AITopics | warm-starting neural network training

Collaborating Authors

warm-starting neural network training

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

On Warm-Starting Neural Network Training

Neural Information Processing SystemsDec-23-2025, 21:31:33 GMT

In many real-world deployments of machine learning systems, data arrive piecemeal. These learning scenarios may be passive, where data arrive incrementally due to structural properties of the problem (e.g., daily financial data) or active, where samples are selected according to a measure of their quality (e.g., experimental design). In both of these cases, we are building a sequence of models that incorporate an increasing amount of data. We would like each of these models in the sequence to be performant and take advantage of all the data that are available to that point. Conventional intuition suggests that when solving a sequence of related optimization problems of this form, it should be possible to initialize using the solution of the previous iterate---to ``warm start'' the optimization rather than initialize from scratch---and see reductions in wall-clock time. However, in practice this warm-starting seems to yield poorer generalization performance than models that have fresh random initializations, even though the final training losses are similar.

name change, proceedings, warm-starting neural network training, (5 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Jordan (0.07)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.61)

Add feedback

DASH: Warm-Starting Neural Network Training in Stationary Settings without Loss of Plasticity

Neural Information Processing SystemsMay-27-2025, 00:39:23 GMT

Warm-starting neural network training by initializing networks with previously learned weights is appealing, as practical neural networks are often deployed under a continuous influx of new data. However, it often leads to loss of plasticity, where the network loses its ability to learn new information, resulting in worse generalization than training from scratch. This occurs even under stationary data distributions, and its underlying mechanism is poorly understood. We develop a framework emulating real-world neural network training and identify noise memorization as the primary cause of plasticity loss when warm-starting on stationary data. Motivated by this, we propose Direction-Aware SHrinking (DASH), a method aiming to mitigate plasticity loss by selectively forgetting memorized noise while preserving learned features.

artificial intelligence, machine learning, warm-starting neural network training, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Review for NeurIPS paper: On Warm-Starting Neural Network Training

Neural Information Processing SystemsJan-22-2025, 20:48:46 GMT

Weaknesses: The paper is limited to evaluating on CIFAR/SVHN, and I worry that this phenomenon may not extend to other methods and tasks. Warm-starting .. in the context of the problem setup of the authors .. seems to be basically the same thing as fine-tuning with more-data. This phenomenon doesn't seem to be happening on more sophisticated computer-vision tasks, and finetuning from datasets like ImageNet leads to similar or better performance with much faster convergence. Although the label-space is different in many fine-tuning setups one can imagine extending the existing setup to cover common and more realistic problems. The paper is written to motivate the idea of re-using weights on for continual/online learning setting but splitting the datasets into 2 sets (training with 1 and fine-tuning with both) seems to me a little toyish and unconventional continual learning setting. In online / continual learning there is a distribution shift as the dataset enters, but the dataset seems to be randomly split meaning that on expectation the distribution of these 2 sets should be the same.

dataset, neurips paper, warm-starting neural network training, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.85)

Add feedback

Review for NeurIPS paper: On Warm-Starting Neural Network Training

Neural Information Processing SystemsJan-22-2025, 20:48:39 GMT

The paper reports an interesting phenomenon -- sometimes fine-tuning a pre-trained network does worse than training from scratch, even when pre-training and fine-tuning are performed on the same dataset. The authors propose a method to remedy this problem. The reviewers are on the fence about the paper, but acknowledge that's its an understudied area. Their main concern is lack of any theoretical insights and the method being a "trick". I believe that findings of this paper are going to be of interest to the community.

fine-tuning, neurips paper, warm-starting neural network training

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

On Warm-Starting Neural Network Training

Neural Information Processing SystemsOct-9-2024, 19:45:35 GMT

In many real-world deployments of machine learning systems, data arrive piecemeal. These learning scenarios may be passive, where data arrive incrementally due to structural properties of the problem (e.g., daily financial data) or active, where samples are selected according to a measure of their quality (e.g., experimental design). In both of these cases, we are building a sequence of models that incorporate an increasing amount of data. We would like each of these models in the sequence to be performant and take advantage of all the data that are available to that point. Conventional intuition suggests that when solving a sequence of related optimization problems of this form, it should be possible to initialize using the solution of the previous iterate---to warm start'' the optimization rather than initialize from scratch---and see reductions in wall-clock time. However, in practice this warm-starting seems to yield poorer generalization performance than models that have fresh random initializations, even though the final training losses are similar.

sequence, warm start, warm-starting neural network training, (1 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Jordan (0.07)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback