AITopics | imp subnetwork

Collaborating Authors

imp subnetwork

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

b6af2c9703f203a2794be03d443af2e3-Paper.pdf

Neural Information Processing SystemsFeb-9-2026, 23:15:10 GMT

In this work, we combine these observations to assess whether such trainable, transferrable subnetworks exist in pre-trained BERT models. For a range of downstream tasks, we indeed find matching subnetworks at 40% to 90% sparsity.

machine learning, natural language, subnetwork, (20 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Italy > Tuscany > Florence (0.04)

Industry: Leisure & Entertainment (0.31)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

A Further Results on the Existence of Matching in BERT

Neural Information Processing SystemsAug-16-2025, 00:03:34 GMT

In Table 2 in Section 3, we show the highest sparsities for which IMP subnetwork performance is within one standard deviation of the unpruned BERT model on each task. As broader context for the relationship between sparsity and accuracy, Figure 11 shows the performance of IMP subnetworks across all sparsities on each task. BERT model is within one standard deviation the subnetwork's performance. In Table 6, we report both common evaluation metrics for MNLI, QQP, STS-B, and MRPC datasets. Besides STS-B (50% Pearson vs. 40% Spearman), winning ticket sparsities are the same on these In Figure 9, we study IMP on networks trained with a multi-task objective.

imp subnetwork, sparsity, subnetwork, (12 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.79)

Add feedback

b6af2c9703f203a2794be03d443af2e3-Paper.pdf

Neural Information Processing SystemsAug-16-2025, 00:03:27 GMT

sparsity, subnetwork, ticket, (11 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Italy > Tuscany > Florence (0.04)

Industry: Leisure & Entertainment (0.37)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Linear Mode Connectivity in Sparse Neural Networks

McDermott, Luke, Cummings, Daniel

arXiv.org Artificial IntelligenceOct-28-2023

With the rise in interest of sparse neural networks, we study how neural network pruning with synthetic data leads to sparse networks with unique training properties. We find that distilled data, a synthetic summarization of the real data, paired with Iterative Magnitude Pruning (IMP) unveils a new class of sparse networks that are more stable to SGD noise on the real data, than either the dense model, or subnetworks found with real data in IMP. That is, synthetically chosen subnetworks often train to the same minima, or exhibit linear mode connectivity. We study this through linear interpolation, loss landscape visualizations, and measuring the diagonal of the hessian. While dataset distillation as a field is still young, we find that these properties lead to synthetic subnetworks matching the performance of traditional IMP with up to 150x less training points in settings where distilled data applies.

information, initialization, subnetwork, (15 more...)

arXiv.org Artificial Intelligence

2310.18769

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

The Lottery Ticket Hypothesis for Pre-trained BERT Networks

Chen, Tianlong, Frankle, Jonathan, Chang, Shiyu, Liu, Sijia, Zhang, Yang, Wang, Zhangyang, Carbin, Michael

arXiv.org Machine LearningOct-18-2020

In natural language processing (NLP), enormous pre-trained models like BERT have become the standard starting point for training on a range of downstream tasks, and similar trends are emerging in other areas of deep learning. In parallel, work on the lottery ticket hypothesis has shown that models for NLP and computer vision contain smaller matching subnetworks capable of training in isolation to full accuracy and transferring to other tasks. In this work, we combine these observations to assess whether such trainable, transferrable subnetworks exist in pre-trained BERT models. For a range of downstream tasks, we indeed find matching subnetworks at 40% to 90% sparsity. We find these subnetworks at (pre-trained) initialization, a deviation from prior NLP research where they emerge only after some amount of training. Subnetworks found on the masked language modeling task (the same task used to pre-train the model) transfer universally; those found on other tasks transfer in a limited fashion if at all. As large-scale pre-training becomes an increasingly central paradigm in deep learning, our results demonstrate that the main lottery ticket observations remain relevant in this context.

machine learning, natural language, subnetwork, (16 more...)

arXiv.org Machine Learning

2007.12223

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Italy > Tuscany > Florence (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Leisure & Entertainment > Gambling (0.83)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Linear Mode Connectivity and the Lottery Ticket Hypothesis

Frankle, Jonathan, Dziugaite, Gintare Karolina, Roy, Daniel M., Carbin, Michael

arXiv.org Machine LearningDec-11-2019

We introduce "instability analysis," a framework for assessing whether the outcome of optimizing a neural network is robust to SGD noise. It entails training two copies of a network on different random data orders. If error does not increase along the linear path between the trained parameters, we say the network is "stable." Instability analysis reveals new properties of neural networks. For example, standard vision models are initially unstable but become stable early in training; from then on, the outcome of optimization is determined up to linear interpolation. We leverage instability analysis to examine iterative magnitude pruning (IMP), the procedure underlying the lottery ticket hypothesis. On small vision tasks, IMP finds sparse "matching subnetworks" that can train in isolation from initialization to full accuracy, but it fails to do so in more challenging settings. We find that IMP subnetworks are matching only when they are stable. In cases where IMP subnetworks are unstable at initialization, they become stable and matching early in training. We augment IMP to rewind subnetworks to their weights early in training, producing sparse subnetworks of large-scale networks, including Resnet-50 for ImageNet, that train to full accuracy. This submission subsumes 1903.01611 ("Stabilizing the Lottery Ticket Hypothesis" and "The Lottery Ticket Hypothesis at Scale").

imp subnetwork, initialization, subnetwork, (14 more...)

arXiv.org Machine Learning

1912.05671

Country: North America > Canada > Ontario > Toronto (0.14)

Genre:

Contests & Prizes (0.77)
Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Gambling (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback