The ALCHEmist: Automated Labeling 500x CHEaper Than LLM Data Annotators

Huang, Tzu-Heng, Cao, Catherine, Bhargava, Vaishnavi, Sala, Frederic

Jun-25-2024–arXiv.org Artificial Intelligence

Large pretrained models can be used as annotators, helping replace or augment crowdworkers and enabling distilling generalist models into smaller specialist models. Unfortunately, this comes at a cost: employing top-of-the-line models often requires paying thousands of dollars for API calls, while the resulting datasets are static and challenging to audit. To address these challenges, we propose a simple alternative: rather than directly querying labels from pretrained models, we task models to generate programs that can produce labels. These programs can be stored and applied locally, re-used and extended, and cost orders of magnitude less. Our system, Alchemist, obtains comparable to or better performance than large language model-based annotation in a range of tasks for a fraction of the cost: on average, improvements amount to a 12.9% enhancement while the total labeling costs across all datasets are reduced by a factor of approximately 500x.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Jun-25-2024

arXiv.org PDF

Add feedback

Country:
- Asia > Middle East
  - UAE (0.14)
- North America > United States
  - Wisconsin (0.14)

Genre:
- Research Report (0.50)

Industry:
- Health & Medicine > Therapeutic Area > Oncology (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.98)
  - Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found