Goto

Collaborating Authors

 unified theoretical perspective


Functional Regularization for Representation Learning: A Unified Theoretical Perspective

Neural Information Processing Systems

Unsupervised and self-supervised learning approaches have become a crucial tool to learn representations for downstream prediction tasks. While these approaches are widely used in practice and achieve impressive empirical gains, their theoretical understanding largely lags behind. Towards bridging this gap, we present a unifying perspective where several such approaches can be viewed as imposing a regularization on the representation via a learnable function using unlabeled data. We propose a discriminative theoretical framework for analyzing the sample complexity of these approaches, which generalizes the framework of (Balcan and Blum, 2010) to allow learnable regularization functions. Our sample complexity bounds show that, with carefully chosen hypothesis classes to exploit the structure in the data, these learnable regularization functions can prune the hypothesis space, and help reduce the amount of labeled data needed. We then provide two concrete examples of functional regularization, one using auto-encoders and the other using masked self-supervision, and apply our framework to quantify the reduction in the sample complexity bound of labeled data. We also provide complementary empirical results to support our analysis.


Review for NeurIPS paper: Functional Regularization for Representation Learning: A Unified Theoretical Perspective

Neural Information Processing Systems

Summary and Contributions: Post-rebuttal comments Thank you for the response. I am happy with the explanations and will increase my score, thus recommending the paper for acceptance. The paper provides a theoretical background for learning tasks that combine two steps: i) representation learning (e.g., via auto-encoders or self-supervised learning) and ii) supervised learning with instances represented via the features learned in step i). The assumption is that in addition to labelled examples the algorithm has access to unlabelled instances. The first step learns a representation function h(x) that belongs to some hypothesis space H.


Review for NeurIPS paper: Functional Regularization for Representation Learning: A Unified Theoretical Perspective

Neural Information Processing Systems

This paper presents a unified framework for analyzing representational learning approaches that make use of unlabeled data for performing auxiliary tasks such as auto-encoders and masked self-supervision. The provided sample complexity bounds show that the auxiliary task provides a functional regularization that can prune the hypothesis space to reduce significantly the number of labeled examples sufficient for learning. The theory is confirmed experimentally on synthetic data. As I understand it, this work is the first to present a unified and natural framework to analyze the impact of unsupervised auxiliary tasks on generalization. Consequently, the novelty of the formulation and its applicability to algorithmic approaches of broad interest to practitioners outweighed the fact that some reviewers saw the technical contributions as rather straightforward.


Functional Regularization for Representation Learning: A Unified Theoretical Perspective

Neural Information Processing Systems

Unsupervised and self-supervised learning approaches have become a crucial tool to learn representations for downstream prediction tasks. While these approaches are widely used in practice and achieve impressive empirical gains, their theoretical understanding largely lags behind. Towards bridging this gap, we present a unifying perspective where several such approaches can be viewed as imposing a regularization on the representation via a learnable function using unlabeled data. We propose a discriminative theoretical framework for analyzing the sample complexity of these approaches, which generalizes the framework of (Balcan and Blum, 2010) to allow learnable regularization functions. Our sample complexity bounds show that, with carefully chosen hypothesis classes to exploit the structure in the data, these learnable regularization functions can prune the hypothesis space, and help reduce the amount of labeled data needed.