Exploiting the Potential of Seq2Seq Models as Robust Few-Shot Learners

Lee, Jihyeon, Kim, Dain, Jung, Doohae, Kim, Boseop, On, Kyoung-Woon

Jul-27-2023–arXiv.org Artificial Intelligence

In-context learning, which offers substantial advantages over fine-tuning, is predominantly observed in decoder-only models, while encoder-decoder (i.e., seq2seq) models excel in methods that rely on weight updates. Recently, a few studies have demonstrated the feasibility of few-shot learning with seq2seq models; however, this has been limited to tasks that align well with the seq2seq architecture, such as summarization and translation. Inspired by these initial studies, we provide a first-ever extensive experiment comparing the in-context few-shot learning capabilities of decoder-only and encoder-decoder models on a broad range of tasks. Furthermore, we propose two methods to more effectively elicit in-context learning ability in seq2seq models: objective-aligned prompting and a fusion-based approach. Remarkably, our approach outperforms a decoder-only model that is six times larger and exhibits significant performance improvements compared to conventional seq2seq models across a variety of settings. We posit that, with the right configuration and prompt design, seq2seq models can be highly effective few-shot learners for a wide spectrum of applications.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

Jul-27-2023

arXiv.org PDF

Add feedback

Country:
- Asia
  - Japan > Honshū
    - Chūbu > Toyama Prefecture > Toyama (0.04)
  - Middle East > Jordan (0.04)
- Europe
  - Denmark > Central Jutland
    - Aarhus (0.04)
  - United Kingdom > Scotland
    - Dumfries and Galloway (0.04)
    - Scottish Borders (0.14)

Genre:
- Research Report > New Finding (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.69)
  - Natural Language
    - Chatbot (0.69)
    - Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found