Parameter Symmetry Breaking and Restoration Determines the Hierarchical Learning in AI Systems

Ziyin, Liu, Xu, Yizhou, Poggio, Tomaso, Chuang, Isaac

arXiv.org Machine Learning 

More and more phenomena that are virtually universal in the learning process have been discovered in contemporary AI systems. These phenomena are shared by models with different architectures, trained on different datasets, and with different training techniques. The existence of these universal phenomena calls for one or a few universal explanations. However, until today, most of the phenomena are instead described by narrow theories tailored to explain each phenomenon separately - often focusing on specific models trained on specific tasks or loss functions and in isolation from other interesting phenomena that are indispensable parts of the deep learning phenomenology. Certainly, it is desirable to have a universal perspective, if not a universal theory, that explains as many phenomena as possible. In the spirit of science, a universal perspective should be independent of system details such as variations in minor architecture definitions, choice of loss functions, training techniques, etc. A universal theory would give the field a simplified paradigm for thinking about and understanding AI systems and a potential design principle for a new generation of more efficient and capable models.