Transformers are Deep Optimizers: Provable In-Context Learning for Deep Model Training
Wu, Weimin, Su, Maojiang, Hu, Jerry Yao-Chieh, Song, Zhao, Liu, Han
–arXiv.org Artificial Intelligence
We study transformers' ability to simulate the training process of deep models. This analysis is not only practical but also timely. On one hand, transformers and deep models [Brown, 2020, Radford et al., 2019] are so powerful, popular and form a new machine learning paradigm -- foundation models. These large-scale machine learning models, trained on vast data, provide a generalpurpose foundation for various tasks with minimal supervision [Team et al., 2023, Touvron et al., 2023, Zhang et al., 2022]. On the other hand, the high cost of pretraining these models often makes them prohibitive outside certain industrial labs [Jiang et al., 2024, Bi et al., 2024, Achiam et al., 2023]. In this work, we aim to push forward this "one-for-all" modeling philosophy of foundation model paradigm [Bommasani et al., 2021] by considering the following research problem: Question 1. Is it possible to train one deep model with the ICL of another foundation model? The implication of Question 1 is profound: if true, one foundation model could lead to many others without resource-intensive pertaining, making foundation models more accessible to general users. In this work, we provide an affirmative example for Question 1. Specifically, we show that transformer models are capable of simulating the training of a deep ReLU-based feed-forward neural network with provable guarantees through In-Context Learning (ICL).
arXiv.org Artificial Intelligence
Nov-25-2024