When Gradient Descent Meets Derivative-Free Optimization: A Match Made in Black-Box Scenario

Han, Chengcheng, Cui, Liqing, Zhu, Renyu, Wang, Jianing, Chen, Nuo, Sun, Qiushi, Li, Xiang, Gao, Ming

May-17-2023–arXiv.org Artificial Intelligence

Large pre-trained language models (PLMs) have garnered significant attention for their versatility and potential for solving a wide spectrum of natural language processing (NLP) tasks. However, the cost of running these PLMs may be prohibitive. Furthermore, PLMs may not be open-sourced due to commercial considerations and potential risks of misuse, such as GPT-3. The parameters and gradients of PLMs are unavailable in this scenario. To solve the issue, black-box tuning has been proposed, which utilizes derivative-free optimization (DFO), instead of gradient descent, for training task-specific continuous prompts. However, these gradient-free methods still exhibit a significant gap compared to gradient-based methods. In this paper, we introduce gradient descent into black-box tuning scenario through knowledge distillation. Furthermore, we propose a novel method GDFO, which integrates gradient descent and derivative-free optimization to optimize task-specific continuous prompts in a harmonized manner. Experimental results show that GDFO can achieve significant performance gains over previous state-of-the-art methods.

continuous prompt, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

May-17-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Virginia (0.04)
- Europe > Romania
  - Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
- Asia
  - Singapore (0.04)
  - China
    - Shanghai > Shanghai (0.04)
    - Guangxi Province > Nanning (0.04)
- Africa > Senegal
  - Kolda Region > Kolda (0.04)

Genre:
- Research Report > Promising Solution (0.68)

Industry:
- Transportation > Air (0.86)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning
    - Statistical Learning > Gradient Descent (1.00)
    - Neural Networks > Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found