$\text{Transformer}^2$: Self-adaptive LLMs

Jan-13-2025–arXiv.org Artificial Intelligence

This dynamic adjustment has parallels to concepts like fast-weight memories, which enable networks to update weights in response to task demands (Schmid-huber, 1992; Gomez & Schmidhuber, 2005), and neural network weights being treated as dynamic programs (Schmidhuber, 2015). Recently, Panigrahi et al. (2023) introduces an approach where a smaller auxiliary transformer is updated dynamically within a larger model, aligning with the principles of self-adaptive behavior. This adaptation can be explored from two perspectives: a macroview, where multiple LLMs collaborate and/or compete, and a microview, where internal adaptations allow a single LLM to specialize in different tasks. Macroview: From this perspective, the system directs queries to LLMs with domain specific expertise, prioritizing outputs from expert models, thereby achieving higher accuracy and task-specific optimization. Such task-specific ensembles can be realized through various mechanisms: multiple LLMs playing distinct roles and coordinate toward a shared goal (Zhuge et al., 2023), engaging in mutual listening and debate (Du et al., 2023), or using meticulously crafted prompt constructions (Zhang et al., 2024) to integrate knowledge library and skill planning.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Jan-13-2025

arXiv.org PDF

Add feedback

Country:
- Asia > Japan (0.14)
- Europe > Netherlands (0.14)

Genre:
- Research Report > New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Natural Language > Large Language Model (1.00)