Grokking phase transitions in learning local rules with gradient descent

Oct-26-2022–arXiv.org Artificial Intelligence

Despite recent progress in understanding the double descend phenomena [1, 2, 3, 4] we still do not have a complete theory of generalisation in over-parameterised models. Two recent empirical observations, neural collapse [5] and grokking (generalisation beyond over-fitting) [6], can help us understand the training and generalisation properties of over-parameterised models. Neural collapse occurs in the terminal phase of training, i.e. the phase with zero train error. It refers to the collapse of the N dimensional, last-layer features (input to the last/classification layer) [5] to a (C 1)- dimensional equiangular tight frame (ETF) structure, where C is the number of classes. The feature vectors converge towards the vertices of the ETF structure such that features for each class are close to one vertex.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

Oct-26-2022

arXiv.org PDF

Add feedback

Country:
- Europe
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
  - Slovenia > Central Slovenia
    - Municipality of Ljubljana > Ljubljana (0.04)

Genre:
- Research Report > New Finding (0.67)

Industry:
- Education (0.94)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Statistical Learning (1.00)
  - Neural Networks > Deep Learning (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found