How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression?
Wu, Jingfeng, Zou, Difan, Chen, Zixiang, Braverman, Vladimir, Gu, Quanquan, Bartlett, Peter L.
Transformer-based large language models (Vaswani et al., 2017) pretrained with diverse tasks have demonstrated strong ability for in-context learning (ICL), that is, the pretrained models can answer new queries based on a few in-context demonstrations (see, e.g., Brown et al. (2020) and references thereafter). ICL is one of the key abilities contributing to the success of large language models, which allows pretrained models to solve multiple downstream tasks without updating their model parameters. However, the statistical foundation of ICL is still in its infancy. A recent line of research aims to quantify ICL by studying transformers pretrained on the linear regression task with a Gaussian prior (Garg et al., 2022; Akyürek et al., 2022; Li et al., 2023b; Raventós et al., 2023). Specifically, Garg et al. (2022); Akyürek et al. (2022); Li et al. (2023b) study the setting where transformers are pretrained in an online manner using independent linear regression tasks with the same Gaussian prior. They find that such a pretrained transformer can perform ICL on fresh linear regression tasks.
Oct-12-2023
- Country:
- North America > United States
- California
- Los Angeles County > Los Angeles (0.14)
- Alameda County > Berkeley (0.04)
- California
- Asia > China
- Hong Kong (0.04)
- North America > United States
- Genre:
- Research Report (1.00)
- Technology: