Exploiting the Potential of Seq2Seq Models as Robust Few-Shot Learners
Lee, Jihyeon, Kim, Dain, Jung, Doohae, Kim, Boseop, On, Kyoung-Woon
–arXiv.org Artificial Intelligence
In-context learning, which offers substantial advantages over fine-tuning, is predominantly observed in decoder-only models, while encoder-decoder (i.e., seq2seq) models excel in methods that rely on weight updates. Recently, a few studies have demonstrated the feasibility of few-shot learning with seq2seq models; however, this has been limited to tasks that align well with the seq2seq architecture, such as summarization and translation. Inspired by these initial studies, we provide a first-ever extensive experiment comparing the in-context few-shot learning capabilities of decoder-only and encoder-decoder models on a broad range of tasks. Furthermore, we propose two methods to more effectively elicit in-context learning ability in seq2seq models: objective-aligned prompting and a fusion-based approach. Remarkably, our approach outperforms a decoder-only model that is six times larger and exhibits significant performance improvements compared to conventional seq2seq models across a variety of settings. We posit that, with the right configuration and prompt design, seq2seq models can be highly effective few-shot learners for a wide spectrum of applications.
arXiv.org Artificial Intelligence
Jul-27-2023
- Country:
- Europe > United Kingdom > Scotland > Scottish Borders (0.14)
- Genre:
- Research Report > New Finding (1.00)
- Technology: