Capturing the learning curves of generic features maps for realistic data sets with a teacher-student model
Loureiro, Bruno, Gerbelot, Cédric, Cui, Hugo, Goldt, Sebastian, Krzakala, Florent, Mézard, Marc, Zdeborová, Lenka
Teacher-student models provide a powerful framework in which the typical case performance of high-dimensional supervised learning tasks can be studied in closed form. In this setting, labels are assigned to data - often taken to be Gaussian i.i.d. - by a teacher model, and the goal is to characterise the typical performance of the student model in recovering the parameters that generated the labels. In this manuscript we discuss a generalisation of this setting where the teacher and student can act on different spaces, generated with fixed, but generic feature maps. This is achieved via the rigorous study of a high-dimensional Gaussian covariate model. Our contribution is two-fold: First, we prove a rigorous formula for the asymptotic training loss and generalisation error achieved by empirical risk minimization for this model. Second, we present a number of situations where the learning curve of the model captures the one of a \emph{realistic data set} learned with kernel regression and classification, with out-of-the-box feature maps such as random projections or scattering transforms, or with pre-learned ones - such as the features learned by training multi-layer neural networks. We discuss both the power and the limitations of the Gaussian teacher-student framework as a typical case analysis capturing learning curves as encountered in practice on real data sets.
Feb-16-2021
- Country:
- Asia > Middle East
- Israel (0.04)
- Europe
- France > Île-de-France
- Italy > Friuli Venezia Giulia
- Trieste Province > Trieste (0.04)
- Switzerland > Vaud
- Lausanne (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Oxfordshire > Oxford (0.04)
- North America > United States
- Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > Middle East
- Genre:
- Research Report (0.82)
- Industry:
- Education > Educational Technology > Educational Software (0.80)
- Technology: