BBTv2: Towards a Gradient-Free Future with Large Language Models

Sun, Tianxiang, He, Zhengfu, Qian, Hong, Zhou, Yunhua, Huang, Xuanjing, Qiu, Xipeng

Oct-14-2022–arXiv.org Artificial Intelligence

Most downstream adaptation methods tune all or part of the parameters of pre-trained models (PTMs) through gradient descent, where the tuning cost increases linearly with the growth of the model size. By contrast, gradient-free methods only require the forward computation of the PTM to tune the prompt, retaining the benefits of efficient tuning and deployment. Though, past work on gradient-free tuning often introduces gradient descent to seek a good initialization of prompt and lacks versatility across tasks and PTMs. In this paper, we present BBTv2, an improved version of Black-Box Tuning, to drive PTMs for few-shot learning. We prepend continuous prompts to every layer of the PTM and propose a divide-and-conquer gradient-free algorithm to optimize the prompts at different layers alternately. Extensive experiments across various tasks and PTMs show that BBTv2 can achieve comparable performance to full model tuning and state-of-the-art parameter-efficient methods (e.g., Adapter, LoRA, BitFit, etc.) under few-shot settings while maintaining much fewer tunable parameters.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

Oct-14-2022

arXiv.org PDF

Add feedback

Country:
- South America > Chile
  - Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America
  - United States
    - Maryland > Baltimore (0.04)
    - Washington > King County
      - Seattle (0.04)
    - New Mexico > Santa Fe County
      - Santa Fe (0.04)
    - Nevada > Clark County
      - Las Vegas (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.14)
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
    - California
      - San Diego County > San Diego (0.04)
      - Los Angeles County > Long Beach (0.04)
  - Canada > Quebec
    - Montreal (0.04)
- Europe
  - Austria (0.04)
  - Spain > Catalonia
    - Barcelona Province > Barcelona (0.04)
  - Romania > Sud - Muntenia Development Region
    - Giurgiu County > Giurgiu (0.04)
  - Portugal > Lisbon
    - Lisbon (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
  - France > Hauts-de-France
    - Nord > Lille (0.04)
- Asia
  - South Korea (0.04)
  - China > Shanghai
    - Shanghai (0.04)

Genre:
- Research Report (0.82)

Industry:
- Leisure & Entertainment > Sports (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning
    - Neural Networks > Deep Learning (0.68)
    - Statistical Learning > Gradient Descent (0.54)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found