How to explain grokking

Kozyrev, S. V.

arXiv.org Artificial Intelligence 

Simple ideas of thermodynamics and kinetic theory allow us to explain better generalization observed for learning by the stochastic gradient optimization procedure, see also [7] (where also overfitting control for GAN model was discussed). We also have explained the grokking(delayed generalization) phenomenon and some properties of grokking observed in [8].

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found