The Power and Limitation of Pretraining-Finetuning for Linear Regression under Covariate Shift Jingfeng Wu

Neural Information Processing Systems 

In addition, we show that finetuning, even with only a small amount of target data, could drastically reduce the amount of source data required by pretraining.