SwitchCIT: Switching for Continual Instruction Tuning of Large Language Models

Wu, Xinbo, Hartman, Max, Jayaraman, Vidhata Arjun, Varshney, Lav R.

Jul-16-2024–arXiv.org Artificial Intelligence

Large language models (LLMs) have demonstrated remarkable capabilities across numerous domains, as highlighted by OpenAI (2023) and Bubeck et al. (2023). However, whereas LLMs pre-trained on extensive language data excel in general language understanding, they may not be optimized for every specific task of interest prompted by instructions. Therefore, there is need for continual instruction learning to adapt LLMs to evolving tasks and domains. Indeed, continual instruction learning is essential for LLMs such as GPT (Radford et al., 2019) to maintain their effectiveness and relevance in handling a wide range of tasks and domains. Such models are trained on vast amounts of text data and fine-tuned for specific applications, often by learning tasks sequentially (Luo et al., 2023), i.e. learning on datasets pertaining to one task all at once, before moving on to the next task. The challenge lies in their ability to continually learn and adapt as they encounter new tasks and information.

large language model, machine learning, switch network, (18 more...)

arXiv.org Artificial Intelligence

Jul-16-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States > Illinois (0.14)

Genre:
- Research Report (0.50)

Industry:
- Education > Educational Setting (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.88)
  - Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found