AITopics | fully-connected neural network

Collaborating Authors

fully-connected neural network

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

7cc532d783a7461f227a5da8ea80bfe1-Supplemental.pdf

Neural Information Processing SystemsFeb-9-2026, 12:26:52 GMT

activation function, fully-connected neural network, manifold, (16 more...)

Neural Information Processing Systems

Country: Asia > China > Shanghai > Shanghai (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

Add feedback

7cc532d783a7461f227a5da8ea80bfe1-Paper.pdf

Neural Information Processing SystemsFeb-9-2026, 12:26:49 GMT

critical point, loss landscape, neural network, (14 more...)

Neural Information Processing Systems

Country:

Asia > China > Shanghai > Shanghai (0.05)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > Canada > British Columbia > Vancouver (0.04)
Europe > France (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback

Understanding the role of depth in the neural tangent kernel for overparameterized neural networks

St-Arnaud, William, Carvalho, Margarida, Farnadi, Golnoosh

arXiv.org Machine LearningNov-11-2025

Overparameterized fully-connected neural networks have been shown to behave like kernel models when trained with gradient descent, under mild conditions on the width, the learning rate, and the parameter initialization. In the limit of infinitely large widths and small learning rate, the kernel that is obtained allows to represent the output of the learned model with a closed-form solution. This closed-form solution hinges on the invertibility of the limiting kernel, a property that often holds on real-world datasets. In this work, we analyze the sensitivity of large ReLU networks to increasing depths by characterizing the corresponding limiting kernel. Our theoretical results demonstrate that the normalized limiting kernel approaches the matrix of ones. In contrast, they show the corresponding closed-form solution approaches a fixed limit on the sphere. We empirically evaluate the order of magnitude in network depth required to observe this convergent behavior, and we describe the essential properties that enable the generalization of our results to other kernels.

artificial intelligence, machine learning, neural network, (15 more...)

arXiv.org Machine Learning

2511.07272

Country: North America > Canada > Quebec (0.28)

Genre: Research Report > New Finding (0.86)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

A Proofs for Section 3

Neural Information Processing SystemsAug-15-2025, 19:40:39 GMT

The lemma is proven in Section D . First consider an even k . This together with ( 37) completes the proof of ( 23). C.1 Proof of Theorem 5 Recall we let a D.1 Proof of Lemma 1 We show the following more general result. The proof is a simple practice for linear algebra.

batch size, decomposition, different learning rate, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)

Add feedback

Supplement to: Embedding Principle of Loss Landscape of Deep Neural Networks

Neural Information Processing SystemsAug-15-2025, 09:50:03 GMT

However, this transform does not inform about the degeneracy of critical points/manifolds. Clearly, this transform is also a critical transform. For the 1D fitting experiments (Figs. 1, 3(a), 4), we use tanh as the activation function, mean squared We use the full-batch gradient descent with learning rate 0.005 to We use the default Adam optimizer of full batch with learning rate 0.02 to train for We also use the default Adam optimizer of full batch with learning rate 0.00003 Their output functions are shown in the figure. Remark that, although Figs. 1 and 5 are case studies each based on a random trial, similar phenomenon Do the main claims made in the abstract and introduction accurately reflect the paper's Did you state the full set of assumptions of all theoretical results? Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Y es] In the Did you specify all the training details (e.g., data splits, hyperparameters, how they Did you report error bars (e.g., with respect to the random seed after running experiments multiple times)?

embedding principle, fully-connected neural network, manifold, (14 more...)

Neural Information Processing Systems

Country: Asia > China > Shanghai > Shanghai (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

Add feedback

Embedding Principle of Loss Landscape of Deep Neural Networks

Neural Information Processing SystemsAug-15-2025, 09:50:00 GMT

critical point, loss landscape, neural network, (14 more...)

Neural Information Processing Systems

Country:

Asia > China > Shanghai > Shanghai (0.05)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > Canada > British Columbia > Vancouver (0.04)
Europe > France (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback

CODES: Benchmarking Coupled ODE Surrogates

Janssen, Robin, Sulzer, Immanuel, Buck, Tobias

arXiv.org Artificial IntelligenceNov-20-2024

We introduce CODES, a benchmark for comprehensive evaluation of surrogate architectures for coupled ODE systems. Besides standard metrics like mean squared error (MSE) and inference time, CODES provides insights into surrogate behaviour across multiple dimensions like interpolation, extrapolation, sparse data, uncertainty quantification and gradient correlation. The benchmark emphasizes usability through features such as integrated parallel training, a web-based configuration generator, and pre-implemented baseline models and datasets. Extensive documentation ensures sustainability and provides the foundation for collaborative improvement. By offering a fair and multi-faceted comparison, CODES helps researchers select the most suitable surrogate for their specific dataset and application while deepening our understanding of surrogate learning behaviour.

benchmark, dataset, prediction error, (16 more...)

arXiv.org Artificial Intelligence

2410.20886

Country: Europe > Germany (0.04)

Genre: Research Report (0.65)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.97)

Add feedback

Sparsifying Parametric Models with L0 Regularization

Botteghi, Nicolò, Fasel, Urban

arXiv.org Artificial IntelligenceSep-5-2024

This document contains an educational introduction to the problem of sparsifying parametric models with L0 regularization. We utilize this approach together with dictionary learning to learn sparse polynomial policies for deep reinforcement learning to control parametric partial differential equations. The code and a tutorial are provided here: https://github.com/nicob15/Sparsifying-Parametric-Models-with-L0.

dim, equation, regularization, (15 more...)

arXiv.org Artificial Intelligence

2409.03489

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Netherlands (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)

Add feedback

cito: An R package for training neural networks using torch

Amesoeder, Christian, Hartig, Florian, Pichler, Maximilian

arXiv.org Artificial IntelligenceJan-24-2024

Deep Neural Networks (DNN) have become a central method in ecology. Most current deep learning (DL) applications rely on one of the major deep learning frameworks, in particular Torch or TensorFlow, to build and train DNN. Using these frameworks, however, requires substantially more experience and time than typical regression functions in the R environment. Here, we present 'cito', a user-friendly R package for DL that allows specifying DNNs in the familiar formula syntax used by many R packages. To fit the models, 'cito' uses 'torch', taking advantage of the numerically optimized torch library, including the ability to switch between training models on the CPU or the graphics processing unit (GPU) (which allows to efficiently train large DNN). Moreover, 'cito' includes many user-friendly functions for model plotting and analysis, including optional confidence intervals (CIs) based on bootstraps for predictions and explainable AI (xAI) metrics for effect sizes and variable importance with CIs and p-values. To showcase a typical analysis pipeline using 'cito', including its built-in xAI features to explore the trained DNN, we build a species distribution model of the African elephant. We hope that by providing a user-friendly R framework to specify, deploy and interpret DNN, 'cito' will make this interesting model class more accessible to ecological data analysis. A stable version of 'cito' can be installed from the comprehensive R archive network (CRAN).

cito, neural network, species distribution model, (13 more...)

arXiv.org Artificial Intelligence

2303.09599

Country:

Europe > Germany > Bavaria > Regensburg (0.05)
Africa > Kenya (0.04)
North America > United States > New York (0.04)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Embedding Principle of Loss Landscape of Deep Neural Networks

Zhang, Yaoyu, Zhang, Zhongwang, Luo, Tao, Xu, Zhi-Qin John

arXiv.org Machine LearningMay-30-2021

Understanding the structure of loss landscape of deep neural networks (DNNs) is obviously important. In this work, we prove an embedding principle that the loss landscape of a DNN "contains" all the critical points of all the narrower DNNs. More precisely, we propose a critical embedding such that any critical point, e.g., local or global minima, of a narrower DNN can be embedded to a critical point/hyperplane of the target DNN with higher degeneracy and preserving the DNN output function. The embedding structure of critical points is independent of loss function and training data, showing a stark difference from other nonconvex problems such as protein-folding. Empirically, we find that a wide DNN is often attracted by highly-degenerate critical points that are embedded from narrow DNNs. The embedding principle provides an explanation for the general easy optimization of wide DNNs and unravels a potential implicit low-complexity regularization during the training. Overall, our work provides a skeleton for the study of loss landscape of DNNs and its implication, by which a more exact and comprehensive understanding can be anticipated in the near future.

critical point, loss landscape, neural network, (14 more...)

arXiv.org Machine Learning

2105.14573

Country:

Asia > China > Shanghai > Shanghai (0.05)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > France (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback