Goto

Collaborating Authors

 lim sup




Phase Diagram of Initial Condensation for Two-layer Neural Networks

Chen, Zhengan, Li, Yuqing, Luo, Tao, Zhou, Zhangchen, Xu, Zhi-Qin John

arXiv.org Artificial Intelligence

The phenomenon of distinct behaviors exhibited by neural networks under varying scales of initialization remains an enigma in deep learning research. In this paper, based on the earlier work by Luo et al.~\cite{luo2021phase}, we present a phase diagram of initial condensation for two-layer neural networks. Condensation is a phenomenon wherein the weight vectors of neural networks concentrate on isolated orientations during the training process, and it is a feature in non-linear learning process that enables neural networks to possess better generalization abilities. Our phase diagram serves to provide a comprehensive understanding of the dynamical regimes of neural networks and their dependence on the choice of hyperparameters related to initialization. Furthermore, we demonstrate in detail the underlying mechanisms by which small initialization leads to condensation at the initial training stage.


On the Existence of the Adversarial Bayes Classifier (Extended Version)

Awasthi, Pranjal, Frank, Natalie S., Mohri, Mehryar

arXiv.org Machine Learning

Adversarial robustness is a critical property in a variety of modern machine learning applications. While it has been the subject of several recent theoretical studies, many important questions related to adversarial robustness are still open. In this work, we study a fundamental question regarding Bayes optimality for adversarial robustness. We provide general sufficient conditions under which the existence of a Bayes optimal classifier can be guaranteed for adversarial robustness. Our results can provide a useful tool for a subsequent study of surrogate losses in adversarial robustness and their consistency properties. This manuscript is the extended version of the paper On the Existence of the Adversarial Bayes Classifier published in NeurIPS (Awasthi et al., 2021b). The results of the original paper did not apply to some non-strictly convex norms. Here we extend our results to all possible norms.