Bernal, Edgar A.
Manifold Regularization for Memory-Efficient Training of Deep Neural Networks
Sartipi, Shadi, Bernal, Edgar A.
One of the prevailing trends in the machine- and deep-learning community is to gravitate towards the use of increasingly larger models in order to keep pushing the state-of-the-art performance envelope. This tendency makes access to the associated technologies more difficult for the average practitioner and runs contrary to the desire to democratize knowledge production in the field. In this paper, we propose a framework for achieving improved memory efficiency in the process of learning traditional neural networks by leveraging inductive-bias-driven network design principles and layer-wise manifold-oriented regularization objectives. Use of the framework results in improved absolute performance and empirical generalization error relative to traditional learning techniques. We provide empirical validation of the framework, including qualitative and quantitative evidence of its effectiveness on two standard image datasets, namely CIFAR-10 and CIFAR-100. The proposed framework can be seamlessly combined with existing network compression methods for further memory savings.
Machine learning the real discriminant locus
Bernal, Edgar A., Hauenstein, Jonathan D., Mehta, Dhagash, Regan, Margaret H., Tang, Tingting
Parameterized systems of polynomial equations arise in many applications in science and engineering with the real solutions describing, for example, equilibria of a dynamical system, linkages satisfying design constraints, and scene reconstruction in computer vision. Since different parameter values can have a different number of real solutions, the parameter space is decomposed into regions whose boundary forms the real discriminant locus. This article views locating the real discriminant locus as a supervised classification problem in machine learning where the goal is to determine classification boundaries over the parameter space, with the classes being the number of real solutions. For multidimensional parameter spaces, this article presents a novel sampling method which carefully samples the parameter space. At each sample point, homotopy continuation is used to obtain the number of real solutions to the corresponding polynomial system. Machine learning techniques including nearest neighbor and deep learning are used to efficiently approximate the real discriminant locus. One application of having learned the real discriminant locus is to develop a real homotopy method that only tracks the real solution paths unlike traditional methods which track all~complex~solution~paths. Examples show that the proposed approach can efficiently approximate complicated solution boundaries such as those arising from the equilibria of the Kuramoto model.
Towards Robust Deep Neural Networks
Wang, Timothy E., Gu, Jack, Mehta, Dhagash, Zhao, Xiaojun, Bernal, Edgar A.
We examine the relationship between the energy landscape of neural networks and their robustness to adversarial attacks. Combining energy landscape techniques developed in computational chemistry with tools drawn from formal methods, we produce empirical evidence that networks corresponding to lower-lying minima in the landscape tend to be more robust. The robustness measure used is the inverse of the sensitivity measure, which we define as the volume of an over-approximation of the reachable set of network outputs under all additive $l_{\infty}$ bounded perturbations on the input data. We present a novel loss function which contains a weighted sensitivity component in addition to the traditional task-oriented and regularization terms. In our experiments on standard machine learning and computer vision datasets (e.g., Iris and MNIST), we show that the proposed loss function leads to networks which reliably optimize the robustness measure as well as other related metrics of adversarial robustness without significant degradation in the classification error.
The Loss Surface of XOR Artificial Neural Networks
Mehta, Dhagash, Zhao, Xiaojun, Bernal, Edgar A., Wales, David J.
Training an artificial neural network involves an optimization process over the landscape defined by the cost (loss) as a function of the network parameters. We explore these landscapes using optimisation tools developed for potential energy landscapes in molecular science. The number of local minima and transition states (saddle points of index one), as well as the ratio of transition states to minima, grow rapidly with the number of nodes in the network. There is also a strong dependence on the regularisation parameter, with the landscape becoming more convex (fewer minima) as the regularisation term increases. We demonstrate that in our formulation, stationary points for networks with $N_h$ hidden nodes, including the minimal network required to fit the XOR data, are also stationary points for networks with $N_{h} +1$ hidden nodes when all the weights involving the additional nodes are zero. Hence, smaller networks optimized to train the XOR data are embedded in the landscapes of larger networks. Our results clarify certain aspects of the classification and sensitivity (to perturbations in the input data) of minima and saddle points for this system, and may provide insight into dropout and network compression.