On the Theory of Transfer Learning: The Importance of Task Diversity
–Neural Information Processing Systems
We provide new statistical guarantees for transfer learning via representation learning--when transfer is achieved by learning a feature representation shared across different tasks. This enables learning on new tasks using far less data than is required to learn them in isolation. Formally, we consider t 1 tasks parameterized by functions of the form f_j \circ h in a general function class F \circ H, where each f_j is a task-specific function in F and h is the shared representation in H . Letting C(\cdot) denote the complexity measure of the function class, we show that for diverse training tasks (1) the sample complexity needed to learn the shared representation across the first t training tasks scales as C(H) t C(F), despite no explicit access to a signal from the feature representation and (2) with an accurate estimate of the representation, the sample complexity needed to learn a new task scales only with C(F) . Our results depend upon a new general notion of task diversity--applicable to models with general tasks, features, and losses--as well as a novel chain rule for Gaussian complexities.
Neural Information Processing Systems
May-27-2025, 00:09:16 GMT