Efficiency for Free: Ideal Data Are Transportable Representations Peng Sun 1,2 Yi Jiang 1 Tao Lin Zhejiang University 2

Neural Information Processing Systems 

Data, the seminal opportunity and challenge in modern machine learning, currently constrains the scalability of representation learning and impedes the pace of model evolution. In this work, we investigate the efficiency properties of data from both optimization and generalization perspectives. Our theoretical and empirical analysis reveals an unexpected finding: for a given task, utilizing a publicly available, taskand architecture-agnostic model (referred to as the'prior model' in this paper) can effectively produce efficient data.