Goto

Collaborating Authors

 fully-connected neural network



Supplement to: Embedding Principle of Loss Landscape of Deep Neural Networks

Neural Information Processing Systems

However, this transform does not inform about the degeneracy of critical points/manifolds. Clearly, this transform is also a critical transform. For the 1D fitting experiments (Figs. 1, 3(a), 4), we use tanh as the activation function, mean squared We use the full-batch gradient descent with learning rate 0.005 to We use the default Adam optimizer of full batch with learning rate 0.02 to train for We also use the default Adam optimizer of full batch with learning rate 0.00003 Their output functions are shown in the figure. Remark that, although Figs. 1 and 5 are case studies each based on a random trial, similar phenomenon Do the main claims made in the abstract and introduction accurately reflect the paper's Did you state the full set of assumptions of all theoretical results? Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Y es] In the Did you specify all the training details (e.g., data splits, hyperparameters, how they Did you report error bars (e.g., with respect to the random seed after running experiments multiple times)?


Embedding Principle of Loss Landscape of Deep Neural Networks

Neural Information Processing Systems

Understanding the structure of loss landscape of deep neural networks (DNNs) is obviously important. In this work, we prove an embedding principle that the loss landscape of a DNN "contains" all the critical points of all the narrower DNNs.


Understanding the role of depth in the neural tangent kernel for overparameterized neural networks

St-Arnaud, William, Carvalho, Margarida, Farnadi, Golnoosh

arXiv.org Machine Learning

Overparameterized fully-connected neural networks have been shown to behave like kernel models when trained with gradient descent, under mild conditions on the width, the learning rate, and the parameter initialization. In the limit of infinitely large widths and small learning rate, the kernel that is obtained allows to represent the output of the learned model with a closed-form solution. This closed-form solution hinges on the invertibility of the limiting kernel, a property that often holds on real-world datasets. In this work, we analyze the sensitivity of large ReLU networks to increasing depths by characterizing the corresponding limiting kernel. Our theoretical results demonstrate that the normalized limiting kernel approaches the matrix of ones. In contrast, they show the corresponding closed-form solution approaches a fixed limit on the sphere. We empirically evaluate the order of magnitude in network depth required to observe this convergent behavior, and we describe the essential properties that enable the generalization of our results to other kernels.


A Proofs for Section 3

Neural Information Processing Systems

The lemma is proven in Section D . First consider an even k . This together with ( 37) completes the proof of ( 23). C.1 Proof of Theorem 5 Recall we let a D.1 Proof of Lemma 1 We show the following more general result. The proof is a simple practice for linear algebra.


Supplement to: Embedding Principle of Loss Landscape of Deep Neural Networks

Neural Information Processing Systems

However, this transform does not inform about the degeneracy of critical points/manifolds. Clearly, this transform is also a critical transform. For the 1D fitting experiments (Figs. 1, 3(a), 4), we use tanh as the activation function, mean squared We use the full-batch gradient descent with learning rate 0.005 to We use the default Adam optimizer of full batch with learning rate 0.02 to train for We also use the default Adam optimizer of full batch with learning rate 0.00003 Their output functions are shown in the figure. Remark that, although Figs. 1 and 5 are case studies each based on a random trial, similar phenomenon Do the main claims made in the abstract and introduction accurately reflect the paper's Did you state the full set of assumptions of all theoretical results? Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Y es] In the Did you specify all the training details (e.g., data splits, hyperparameters, how they Did you report error bars (e.g., with respect to the random seed after running experiments multiple times)?


Embedding Principle of Loss Landscape of Deep Neural Networks

Neural Information Processing Systems

Understanding the structure of loss landscape of deep neural networks (DNNs) is obviously important. In this work, we prove an embedding principle that the loss landscape of a DNN "contains" all the critical points of all the narrower DNNs.


CODES: Benchmarking Coupled ODE Surrogates

Janssen, Robin, Sulzer, Immanuel, Buck, Tobias

arXiv.org Artificial Intelligence

We introduce CODES, a benchmark for comprehensive evaluation of surrogate architectures for coupled ODE systems. Besides standard metrics like mean squared error (MSE) and inference time, CODES provides insights into surrogate behaviour across multiple dimensions like interpolation, extrapolation, sparse data, uncertainty quantification and gradient correlation. The benchmark emphasizes usability through features such as integrated parallel training, a web-based configuration generator, and pre-implemented baseline models and datasets. Extensive documentation ensures sustainability and provides the foundation for collaborative improvement. By offering a fair and multi-faceted comparison, CODES helps researchers select the most suitable surrogate for their specific dataset and application while deepening our understanding of surrogate learning behaviour.


Sparsifying Parametric Models with L0 Regularization

Botteghi, Nicolò, Fasel, Urban

arXiv.org Artificial Intelligence

This document contains an educational introduction to the problem of sparsifying parametric models with L0 regularization. We utilize this approach together with dictionary learning to learn sparse polynomial policies for deep reinforcement learning to control parametric partial differential equations. The code and a tutorial are provided here: https://github.com/nicob15/Sparsifying-Parametric-Models-with-L0.


cito: An R package for training neural networks using torch

Amesoeder, Christian, Hartig, Florian, Pichler, Maximilian

arXiv.org Artificial Intelligence

Deep Neural Networks (DNN) have become a central method in ecology. Most current deep learning (DL) applications rely on one of the major deep learning frameworks, in particular Torch or TensorFlow, to build and train DNN. Using these frameworks, however, requires substantially more experience and time than typical regression functions in the R environment. Here, we present 'cito', a user-friendly R package for DL that allows specifying DNNs in the familiar formula syntax used by many R packages. To fit the models, 'cito' uses 'torch', taking advantage of the numerically optimized torch library, including the ability to switch between training models on the CPU or the graphics processing unit (GPU) (which allows to efficiently train large DNN). Moreover, 'cito' includes many user-friendly functions for model plotting and analysis, including optional confidence intervals (CIs) based on bootstraps for predictions and explainable AI (xAI) metrics for effect sizes and variable importance with CIs and p-values. To showcase a typical analysis pipeline using 'cito', including its built-in xAI features to explore the trained DNN, we build a species distribution model of the African elephant. We hope that by providing a user-friendly R framework to specify, deploy and interpret DNN, 'cito' will make this interesting model class more accessible to ecological data analysis. A stable version of 'cito' can be installed from the comprehensive R archive network (CRAN).