The Lottery Ticket Hypothesis for Pre-trained BERT Networks

Oct-11-2024, 04:10:49 GMT–Neural Information Processing Systems

In natural language processing (NLP), enormous pre-trained models like BERT have become the standard starting point for training on a range of downstream tasks, and similar trends are emerging in other areas of deep learning. In parallel, work on the lottery ticket hypothesis has shown that models for NLP and computer vision contain smaller matching subnetworks capable of training in isolation to full accuracy and transferring to other tasks. In this work, we combine these observations to assess whether such trainable, transferrable subnetworks exist in pre-trained BERT models. For a range of downstream tasks, we indeed find matching subnetworks at 40% to 90% sparsity. We find these subnetworks at (pre-trained) initialization, a deviation from prior NLP research where they emerge only after some amount of training.

downstream task, lottery ticket hypothesis, pre-trained bert network, (2 more...)

Neural Information Processing Systems

Oct-11-2024, 04:10:49 GMT

Conferences Web Page

Add feedback

Genre:
- Contests & Prizes (0.69)

Industry:
- Leisure & Entertainment > Gambling (0.69)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.33)
  - Natural Language (1.00)