Grokking phase transitions in learning local rules with gradient descent

Žunkovič, Bojan, Ilievski, Enej

arXiv.org Artificial Intelligence 

Despite recent progress in understanding the double descend phenomena [1, 2, 3, 4] we still do not have a complete theory of generalisation in over-parameterised models. Two recent empirical observations, neural collapse [5] and grokking (generalisation beyond over-fitting) [6], can help us understand the training and generalisation properties of over-parameterised models. Neural collapse occurs in the terminal phase of training, i.e. the phase with zero train error. It refers to the collapse of the N dimensional, last-layer features (input to the last/classification layer) [5] to a (C 1)- dimensional equiangular tight frame (ETF) structure, where C is the number of classes. The feature vectors converge towards the vertices of the ETF structure such that features for each class are close to one vertex.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found