The Power and Limitation of Pretraining-Finetuning for Linear Regression under Covariate Shift

Wu, Jingfeng, Zou, Difan, Braverman, Vladimir, Gu, Quanquan, Kakade, Sham M.

Aug-3-2022–arXiv.org Artificial Intelligence

In transfer learning (Pan and Yang, 2009; Sugiyama and Kawanabe, 2012), an algorithm is provided with abundant data from a source domain and scarce or no data from a target domain, and aims to train a model that generalizes well on the target domain. A simple yet effective approach is to pretrain a model with the rich source data and then finetune the model with the available target data via, e.g., stochastic gradient descent (SGD) (see, e.g., Yosinski et al. (2014)). Despite its wide applicability in practice, the power and limitation of the pretraining-finetuning based transfer learning framework is not fully understood in theory. The focus of this work is to consider this issue in a specific transfer learning setup known as covariate shift (Pan and Yang, 2009; Sugiyama and Kawanabe, 2012), where the source and target distributions differ in their marginal distributions over the input, but coincide in their conditional distribution of the output given the input. Regarding the theory of learning with covariate shift, there exists a rich set of results (Ben-David et al., 2010; Germain et al., 2013; Mansour et al., 2009; Mohri and Muñoz Medina, 2012; Cortes and

eff, supervised learning, target data, (16 more...)

arXiv.org Artificial Intelligence

Aug-3-2022

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Maryland > Baltimore (0.04)
  - Massachusetts > Middlesex County
    - Cambridge (0.04)
  - California > Los Angeles County
    - Los Angeles (0.28)

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning
  - Regression (0.65)
  - Gradient Descent (0.54)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found