Distill or Annotate? Cost-Efficient Fine-Tuning of Compact Models

Jul-5-2023–arXiv.org Artificial Intelligence

Fine-tuning large models is highly effective, however, inference can be expensive and produces carbon emissions. Knowledge distillation has been shown to be a practical solution to reduce inference costs, but the distillation process itself requires significant computational resources. Rather than buying or renting GPUs to fine-tune, then distill a large model, an NLP practitioner might instead choose to allocate the available budget to hire annotators and manually label additional fine-tuning data. In this paper, we investigate how to most efficiently use a fixed budget to build a compact model. Through extensive experiments on six diverse tasks, we show that distilling from T5-XXL (11B) to T5-Small (60M) is almost always a cost-efficient strategy compared to annotating more data to directly train a compact model (T5-Small). We further investigate how the optimal budget allocated towards computation varies across scenarios. We will make our code, datasets, annotation cost estimates, and baseline models available as a benchmark to support further work on cost-efficient training of compact models.

computational linguistic, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

Jul-5-2023

arXiv.org PDF

Add feedback

Country:
- North America
  - Dominican Republic (0.04)
  - United States
    - Washington > King County
      - Seattle (0.14)
    - Minnesota > Hennepin County
      - Minneapolis (0.14)
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
- Europe
  - Spain > Catalonia
    - Barcelona Province > Barcelona (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
- Asia
  - Bangladesh (0.04)
  - Middle East > UAE
    - Abu Dhabi Emirate > Abu Dhabi (0.04)

Genre:
- Research Report (1.00)

Industry:
- Information Technology (0.68)
- Government > Regional Government
  - North America Government > United States Government (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks (0.70)
  - Natural Language > Large Language Model (0.48)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found