Sundaramoorthi, Ganesh
StEik: Stabilizing the Optimization of Neural Signed Distance Functions and Finer Shape Representation
Yang, Huizong, Sun, Yuxin, Sundaramoorthi, Ganesh, Yezzi, Anthony
We present new insights and a novel paradigm (StEik) for learning implicit neural representations (INR) of shapes. In particular, we shed light on the popular eikonal loss used for imposing a signed distance function constraint in INR. We show analytically that as the representation power of the network increases, the optimization approaches a partial differential equation (PDE) in the continuum limit that is unstable. We show that this instability can manifest in existing network optimization, leading to irregularities in the reconstructed surface and/or convergence to sub-optimal local minima, and thus fails to capture fine geometric and topological structure. We show analytically how other terms added to the loss, currently used in the literature for other purposes, can actually eliminate these instabilities. However, such terms can over-regularize the surface, preventing the representation of fine shape detail. Based on a similar PDE theory for the continuum limit, we introduce a new regularization term that still counteracts the eikonal instability but without over-regularizing. Furthermore, since stability is now guaranteed in the continuum limit, this stabilization also allows for considering new network structures that are able to represent finer shape detail. We introduce such a structure based on quadratic layers. Experiments on multiple benchmark data sets show that our new regularization and network are able to capture more precise shape details and more accurate topology than existing state-of-the-art.
HAct: Out-of-Distribution Detection with Neural Net Activation Histograms
Mondal, Sudeepta, Sundaramoorthi, Ganesh
We propose a simple, efficient, and accurate method for detecting out-of-distribution (OOD) data for trained neural networks. We propose a novel descriptor, HAct - activation histograms, for OOD detection, that is, probability distributions (approximated by histograms) of output values of neural network layers under the influence of incoming data. We formulate an OOD detector based on HAct descriptors. We demonstrate that HAct is significantly more accurate than state-of-the-art in OOD detection on multiple image classification benchmarks. For instance, our approach achieves a true positive rate (TPR) of 95% with only 0.03% false-positives using Resnet-50 on standard OOD benchmarks, outperforming previous state-of-the-art by 20.67% in the false positive rate (at the same TPR of 95%). The computational efficiency and the ease of implementation makes HAct suitable for online implementation in monitoring deployed neural networks in practice at scale.
Surprising Instabilities in Training Deep Networks and a Theoretical Analysis
Sun, Yuxin, Lao, Dong, Sundaramoorthi, Ganesh, Yezzi, Anthony
We discover restrained numerical instabilities in current training practices of deep networks with stochastic gradient descent (SGD). We show numerical error (on the order of the smallest floating point bit) induced from floating point arithmetic in training deep nets can be amplified significantly and result in significant test accuracy variance, comparable to the test accuracy variance due to stochasticity in SGD. We show how this is likely traced to instabilities of the optimization dynamics that are restrained, i.e., localized over iterations and regions of the weight tensor space. We do this by presenting a theoretical framework using numerical analysis of partial differential equations (PDE), and analyzing the gradient descent PDE of convolutional neural networks (CNNs). We show that it is stable only under certain conditions on the learning rate and weight decay. We show that rather than blowing up when the conditions are violated, the instability can be restrained. We show this is a consequence of the non-linear PDE associated with the gradient descent of the CNN, whose local linearization changes when over-driving the step size of the discretization, resulting in a stabilizing effect. We link restrained instabilities to the recently discovered Edge of Stability (EoS) phenomena, in which the stable step size predicted by classical theory is exceeded while continuing to optimize the loss and still converging. Because restrained instabilities occur at the EoS, our theory provides new predictions about the EoS, in particular, the role of regularization and the dependence on the network complexity.
Flow-Guided Video Inpainting with Scene Templates
Lao, Dong, Zhu, Peihao, Wonka, Peter, Sundaramoorthi, Ganesh
We consider the problem of filling in missing spatio-temporal regions of a video. We provide a novel flow-based solution by introducing a generative model of images in relation to the scene (without missing regions) and mappings from the scene to images. We use the model to jointly infer the scene template, a 2D representation of the scene, and the mappings. This ensures consistency of the frame-to-frame flows generated to the underlying scene, reducing geometric distortions in flow based inpainting. The template is mapped to the missing regions in the video by a new L2-L1 interpolation scheme, creating crisp inpaintings and reducing common blur and distortion artifacts. We show on two benchmark datasets that our approach out-performs state-of-the-art quantitatively and in user studies.
Channel-Directed Gradients for Optimization of Convolutional Neural Networks
Lao, Dong, Zhu, Peihao, Wonka, Peter, Sundaramoorthi, Ganesh
We introduce optimization methods for convolutional neural networks that can be used to improve existing gradient-based optimization in terms of generalization error. The method requires only simple processing of existing stochastic gradients, can be used in conjunction with any optimizer, and has only a linear overhead (in the number of parameters) compared to computation of the stochastic gradient. We show that defining the gradients along the output channel direction leads to a performance boost, while other directions can be detrimental. We present the continuum theory of such gradients, its discretization, and application to deep networks. Experiments on benchmark datasets, several networks and baseline optimizers show that optimizers can be improved in generalization error by simply computing the stochastic gradient with respect to output-channel directed metrics. Stochastic gradient descent (SGD) is currently the dominant algorithm for optimizing large-scale convolutional neural networks (CNNs) LeCun et al. (1998); Simonyan & Zisserman (2014); He et al. (2016b). Although there has been large activity in optimization methods seeking to improve performance, SGD still dominates in large-scale CNN optimization in terms of its generalization ability. Despite SGD's dominance, there is still often a gap between training and real-world test accuracy performance in applications, which necessitates research in optimization methods to increase generalization accuracy.
Variational PDEs for Acceleration on Manifolds and Application to Diffeomorphisms
Sundaramoorthi, Ganesh, Yezzi, Anthony
We consider the optimization of cost functionals on manifolds and derive a variational approach to accelerated methods on manifolds. We demonstrate the methodology on the infinite-dimensional manifold of diffeomorphisms, motivated by registration problems in computer vision. We build on the variational approach to accelerated optimization by Wibisono, Wilson and Jordan, which applies in finite dimensions, and generalize that approach to infinite dimensional manifolds. We derive the continuum evolution equations, which are partial differential equations (PDE), and relate them to simple mechanical principles. Our approach can also be viewed as a generalization of the $L^2$ optimal mass transport problem. Our approach evolves an infinite number of particles endowed with mass, represented as a mass density. The density evolves with the optimization variable, and endows the particles with dynamics. This is different than current accelerated methods where only a single particle moves and hence the dynamics does not depend on the mass. We derive the theory, compute the PDEs for acceleration, and illustrate the behavior of this new accelerated optimization scheme.
Variational PDEs for Acceleration on Manifolds and Application to Diffeomorphisms
Sundaramoorthi, Ganesh, Yezzi, Anthony
We consider the optimization of cost functionals on manifolds and derive a variational approach to accelerated methods on manifolds. We demonstrate the methodology on the infinite-dimensional manifold of diffeomorphisms, motivated by registration problems in computer vision. We build on the variational approach to accelerated optimization by Wibisono, Wilson and Jordan, which applies in finite dimensions, and generalize that approach to infinite dimensional manifolds. We derive the continuum evolution equations, which are partial differential equations (PDE), and relate them to simple mechanical principles. Our approach can also be viewed as a generalization of the $L^2$ optimal mass transport problem. Our approach evolves an infinite number of particles endowed with mass, represented as a mass density. The density evolves with the optimization variable, and endows the particles with dynamics. This is different than current accelerated methods where only a single particle moves and hence the dynamics does not depend on the mass. We derive the theory, compute the PDEs for acceleration, and illustrate the behavior of this new accelerated optimization scheme.