How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression?

Wu, Jingfeng, Zou, Difan, Chen, Zixiang, Braverman, Vladimir, Gu, Quanquan, Bartlett, Peter L.

Oct-12-2023–arXiv.org Machine Learning

Transformer-based large language models (Vaswani et al., 2017) pretrained with diverse tasks have demonstrated strong ability for in-context learning (ICL), that is, the pretrained models can answer new queries based on a few in-context demonstrations (see, e.g., Brown et al. (2020) and references thereafter). ICL is one of the key abilities contributing to the success of large language models, which allows pretrained models to solve multiple downstream tasks without updating their model parameters. However, the statistical foundation of ICL is still in its infancy. A recent line of research aims to quantify ICL by studying transformers pretrained on the linear regression task with a Gaussian prior (Garg et al., 2022; Akyürek et al., 2022; Li et al., 2023b; Raventós et al., 2023). Specifically, Garg et al. (2022); Akyürek et al. (2022); Li et al. (2023b) study the setting where transformers are pretrained in an online manner using independent linear regression tasks with the same Gaussian prior. They find that such a pretrained transformer can perform ICL on fresh linear regression tasks.

large language model, machine learning, regression, (19 more...)

arXiv.org Machine Learning

Oct-12-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - California
    - Los Angeles County > Los Angeles (0.14)
    - Alameda County > Berkeley (0.04)
- Asia > China
  - Hong Kong (0.04)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning
    - Statistical Learning > Regression (1.00)
    - Neural Networks > Deep Learning (0.65)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found