The Power and Limitation of Pretraining-Finetuning for Linear Regression under Covariate Shift
Wu, Jingfeng, Zou, Difan, Braverman, Vladimir, Gu, Quanquan, Kakade, Sham M.
–arXiv.org Artificial Intelligence
In transfer learning (Pan and Yang, 2009; Sugiyama and Kawanabe, 2012), an algorithm is provided with abundant data from a source domain and scarce or no data from a target domain, and aims to train a model that generalizes well on the target domain. A simple yet effective approach is to pretrain a model with the rich source data and then finetune the model with the available target data via, e.g., stochastic gradient descent (SGD) (see, e.g., Yosinski et al. (2014)). Despite its wide applicability in practice, the power and limitation of the pretraining-finetuning based transfer learning framework is not fully understood in theory. The focus of this work is to consider this issue in a specific transfer learning setup known as covariate shift (Pan and Yang, 2009; Sugiyama and Kawanabe, 2012), where the source and target distributions differ in their marginal distributions over the input, but coincide in their conditional distribution of the output given the input. Regarding the theory of learning with covariate shift, there exists a rich set of results (Ben-David et al., 2010; Germain et al., 2013; Mansour et al., 2009; Mohri and Muñoz Medina, 2012; Cortes and
arXiv.org Artificial Intelligence
Aug-3-2022
- Country:
- North America > United States
- California > Los Angeles County
- Los Angeles (0.28)
- Maryland > Baltimore (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- California > Los Angeles County
- North America > United States
- Genre:
- Research Report (0.64)
- Technology: