How to explain grokking
–arXiv.org Artificial Intelligence
Simple ideas of thermodynamics and kinetic theory allow us to explain better generalization observed for learning by the stochastic gradient optimization procedure, see also [7] (where also overfitting control for GAN model was discussed). We also have explained the grokking(delayed generalization) phenomenon and some properties of grokking observed in [8].
arXiv.org Artificial Intelligence
Jan-1-2025