Transformers are Deep Optimizers: Provable In-Context Learning for Deep Model Training

Wu, Weimin, Su, Maojiang, Hu, Jerry Yao-Chieh, Song, Zhao, Liu, Han

Nov-25-2024–arXiv.org Artificial Intelligence

We study transformers' ability to simulate the training process of deep models. This analysis is not only practical but also timely. On one hand, transformers and deep models [Brown, 2020, Radford et al., 2019] are so powerful, popular and form a new machine learning paradigm -- foundation models. These large-scale machine learning models, trained on vast data, provide a generalpurpose foundation for various tasks with minimal supervision [Team et al., 2023, Touvron et al., 2023, Zhang et al., 2022]. On the other hand, the high cost of pretraining these models often makes them prohibitive outside certain industrial labs [Jiang et al., 2024, Bi et al., 2024, Achiam et al., 2023]. In this work, we aim to push forward this "one-for-all" modeling philosophy of foundation model paradigm [Bommasani et al., 2021] by considering the following research problem: Question 1. Is it possible to train one deep model with the ICL of another foundation model? The implication of Question 1 is profound: if true, one foundation model could lead to many others without resource-intensive pertaining, making foundation models more accessible to general users. In this work, we provide an affirmative example for Question 1. Specifically, we show that transformer models are capable of simulating the training of a deep ReLU-based feed-forward neural network with provable guarantees through In-Context Learning (ICL).

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

Nov-25-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.28)

Genre:
- Research Report > New Finding (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language > Large Language Model (1.00)