Goto

Collaborating Authors

 deep neural net



Generalization in multitask deep neural classifiers: a statistical physics approach

Neural Information Processing Systems

A proper understanding of the striking generalization abilities of deep neural networks presents an enduring puzzle. Recently, there has been a growing body of numerically-grounded theoretical work that has contributed important insights to the theory of learning in deep neural nets. There has also been a recent interest in extending these analyses to understanding how multitask learning can further improve the generalization capacity of deep neural nets. These studies deal almost exclusively with regression tasks which are amenable to existing analytical techniques. We develop an analytic theory of the nonlinear dynamics of generalization of deep neural networks trained to solve classification tasks using softmax outputs and cross-entropy loss, addressing both single task and multitask settings. We do so by adapting techniques from the statistical physics of disordered systems, accounting for both finite size datasets and correlated outputs induced by the training dynamics. We discuss the validity of our theoretical results in comparison to a comprehensive suite of numerical experiments. Our analysis provides theoretical support for the intuition that the performance of multitask learning is determined by the noisiness of the tasks and how well their input features align with each other. Highly related, clean tasks benefit each other, whereas unrelated, clean tasks can be detrimental to individual task performance.


Deep Neural Nets with Interpolating Function as Output Activation

Neural Information Processing Systems

We replace the output layer of deep neural nets, typically the softmax function, by a novel interpolating function. And we propose end-to-end training and testing algorithms for this new architecture. Compared to classical neural nets with softmax function as output activation, the surrogate with interpolating function as output activation combines advantages of both deep and manifold learning. The new framework demonstrates the following major advantages: First, it is better applicable to the case with insufficient training data. Second, it significantly improves the generalization accuracy on a wide variety of networks. The algorithm is implemented in PyTorch, and the code is available at https://github.com/


9872ed9fc22fc182d371c3e9ed316094-AuthorFeedback.pdf

Neural Information Processing Systems

We thank the reviewers for carefully reading the manuscript and providing us with valuable feedback. This was omitted from the submitted manuscript due to space. We will clarify L220 to make this more precise. However, we will certainly include citations to both Danielyan and Tseng in the manuscript. L17 to say that the true prior might be unknown for certain signals, such as natural images.


Generalization in multitask deep neural classifiers: a statistical physics approach

Neural Information Processing Systems

A proper understanding of the striking generalization abilities of deep neural networks presents an enduring puzzle. Recently, there has been a growing body of numerically-grounded theoretical work that has contributed important insights to the theory of learning in deep neural nets. There has also been a recent interest in extending these analyses to understanding how multitask learning can further improve the generalization capacity of deep neural nets. These studies deal almost exclusively with regression tasks which are amenable to existing analytical techniques. We develop an analytic theory of the nonlinear dynamics of generalization of deep neural networks trained to solve classification tasks using softmax outputs and cross-entropy loss, addressing both single task and multitask settings.


Reviews: Robustness of classifiers: from adversarial to random noise

Neural Information Processing Systems

This paper offers a thorough analysis of the effect of both worse-case (adversarial) and random noise in machine learning classifiers. It derives bounds that precisely describe the robustness of classifiers in function of the curvature of the decision boundary. This leads to some surprisingly (at least to me) general conclusions: * For random noise, the robustness of classifiers behaves as sqrt(d) times the distance from the datapoint to the classification boundary (where d denotes the dimension of the data) provided the curvature of the decision boundary is sufficiently small. This corroborates the intuition that random noise is less of an issue for high-dimensional data. On the other hand, how do we know the curvature of decision boundaries for general classifiers?


Reviews: Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs

Neural Information Processing Systems

Update after author response: Thank you for the response. Additional details that the curves between the local optima are not unique would be also interesting to see. Summary: This paper first shows a very interesting finding on the loss surfaces of deep neural nets, and then presents a new ensembling method called Fast Geometric Ensembling (FGE). Given two already well trained deep neural nets (with no limitations on their architectures, apparently), we have two sets of weight vectors w1 and w2 (in a very high-dimensional space). This paper states a (surprising) fact that for given two weights w1 and w2, we can (always?) Figure 1 demonstrates this, and Left is the training accuracy plot on the 2D subspace passing independent weights w1, w2, w3 of ResNet-164 (from different random starts); whereas Middle and Right are the 2D subspace passing independent weights w1, w2 and one bend point w3 on the curve (Middle: Bezier, Right: Polygonal chain).


Reviews: Deep Neural Nets with Interpolating Function as Output Activation

Neural Information Processing Systems

This paper develops a new data-dependent output activation function base on interpolation function. It is a nonparametric model based on a subset of training data. The activation function is defined in an implicit manner by solving a set of linear equations. Therefore, it cannot be solved directly by backpropagation. Instead it proposes an auxiliary network with linear output to approximate the gradient.


Interview with Yuan Yang: working at the intersection of AI and cognitive science

AIHub

In this interview series, we're meeting some of the AAAI/SIGAI Doctoral Consortium participants to find out more about their research. The Doctoral Consortium provides an opportunity for a group of PhD students to discuss and explore their research interests and career objectives in an interdisciplinary workshop together with a panel of established researchers. In this latest interview, we hear from Yuan Yang, who completed his PhD in May. This autumn, Yuan will be joining the College of Information, Mechanical and Electrical Engineering, Shanghai Normal University as an associate professor. From August 2018 to May 2024, I did my PhD in the computer science department at Vanderbilt University, which is located in the famous music city – Nashville, Tennessee.


TENG: Time-Evolving Natural Gradient for Solving PDEs With Deep Neural Nets Toward Machine Precision

Chen, Zhuo, McCarran, Jacob, Vizcaino, Esteban, Soljačić, Marin, Luo, Di

arXiv.org Artificial Intelligence

Partial differential equations (PDEs) are instrumental for modeling dynamical systems in science and engineering. The advent of neural networks has initiated a significant shift in tackling these complexities though challenges in accuracy persist, especially for initial value problems. In this paper, we introduce the $\textit{Time-Evolving Natural Gradient (TENG)}$, generalizing time-dependent variational principles and optimization-based time integration, leveraging natural gradient optimization to obtain high accuracy in neural-network-based PDE solutions. Our comprehensive development includes algorithms like TENG-Euler and its high-order variants, such as TENG-Heun, tailored for enhanced precision and efficiency. TENG's effectiveness is further validated through its performance, surpassing current leading methods and achieving $\textit{machine precision}$ in step-by-step optimizations across a spectrum of PDEs, including the heat equation, Allen-Cahn equation, and Burgers' equation.