FIAT: Fusing learning paradigms with Instruction-Accelerated Tuning

Wang, Xinyi, Wieting, John, Clark, Jonathan H.

arXiv.org Artificial Intelligence 

Learning paradigms for large language models (LLMs) currently tend to fall within either in-context learning (ICL) or full fine-tuning. Each of these comes with their own trade-offs based on available data, model size, compute cost, ease-of-use, and final quality with neither solution performing well across-the-board. In this article, we first describe ICL and fine-tuning paradigms in a way that highlights their natural connections. Some of their most exciting capabilities, such as producing logical reasoning to solve a problem, are found to emerge only when the model size is over a certain threshold, often hundreds of billions of parameters (Wei et al., 2022b;a). The impressive capabilities of these models to produce high-quality responses without any task-specific tuning along with the very high cost of further tuning such models has led much recent work to focus on the paradigm of In-Context Learning (ICL)--placing a few task-specific examples and instructions into the model's input (Brown et al., 2020; Chowdhery et al., 2022; Google et al., 2023; OpenAI, 2023). Although prior work has seen that fine-tuning a model on task data can often lead to superior performance on the downstream task compared to ICL (Scao & Rush, 2021; Schick & Schütze, 2020a;b; Asai et al., 2023), there are significantly fewer recent efforts on fine-tuning models for tasks with limited data, perhaps because the time and compute costs associated with tuning a very large model drives practitioners toward smaller models, abandoning the ability to take advantage of emergent model capabilities.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found