The Power and Limitation of Pretraining-Finetuning for Linear Regression under Covariate Shift

Wu, Jingfeng, Zou, Difan, Braverman, Vladimir, Gu, Quanquan, Kakade, Sham M.

arXiv.org Artificial Intelligence 

In transfer learning (Pan and Yang, 2009; Sugiyama and Kawanabe, 2012), an algorithm is provided with abundant data from a source domain and scarce or no data from a target domain, and aims to train a model that generalizes well on the target domain. A simple yet effective approach is to pretrain a model with the rich source data and then finetune the model with the available target data via, e.g., stochastic gradient descent (SGD) (see, e.g., Yosinski et al. (2014)). Despite its wide applicability in practice, the power and limitation of the pretraining-finetuning based transfer learning framework is not fully understood in theory. The focus of this work is to consider this issue in a specific transfer learning setup known as covariate shift (Pan and Yang, 2009; Sugiyama and Kawanabe, 2012), where the source and target distributions differ in their marginal distributions over the input, but coincide in their conditional distribution of the output given the input. Regarding the theory of learning with covariate shift, there exists a rich set of results (Ben-David et al., 2010; Germain et al., 2013; Mansour et al., 2009; Mohri and Muñoz Medina, 2012; Cortes and

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found