Using physics-inspired Singular Learning Theory to understand grokking & other phase transitions in modern neural networks

Open in new window