AITopics | baseline network

Country:

North America > Canada (0.04)
Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsFeb-8-2026, 20:13:15 GMT

4d893f766ab60e5337659b9e71883af4-Supplemental-Conference.pdf

advantage function, experiment, standard error, (17 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.47)

Neural Information Processing SystemsFeb-7-2026, 10:43:51 GMT

The Unreasonable Effectiveness of Fully-Connected Layers for Low-Data Regimes

We perform classification experiments for a large range of network backbones and several standard datasets on supervised learning and active learning.

artificial intelligence, fully-connected layer, machine learning, (16 more...)

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.05)
Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Neural Information Processing SystemsOct-2-2025, 04:01:37 GMT

The Unreasonable Effectiveness of Fully-Connected Layers for Low-Data Regimes

We perform classification experiments for a large range of network backbones and several standard datasets on supervised learning and active learning.

artificial intelligence, fully-connected layer, machine learning, (16 more...)

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.05)
Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Schotthöfer, Steffen, Yang, H. Lexie, Schnake, Stefan

Dynamical Low-Rank Compression of Neural Networks with Robustness under Adversarial Attacks

arXiv.org Artificial IntelligenceSep-24-2025

Deployment of neural networks on resource-constrained devices demands models that are both compact and robust to adversarial inputs. However, compression and adversarial robustness often conflict. In this work, we introduce a dynamical low-rank training scheme enhanced with a novel spectral regularizer that controls the condition number of the low-rank core in each layer. This approach mitigates the sensitivity of compressed models to adversarial perturbations without sacrificing accuracy on clean data. The method is model- and data-agnostic, computationally efficient, and supports rank adaptivity to automatically compress the network at hand. Extensive experiments across standard architectures, datasets, and adversarial attacks show the regularized networks can achieve over 94% compression while recovering or improving adversarial accuracy relative to uncompressed baselines.

artificial intelligence, machine learning, natural language, (19 more...)

2505.08022

Country:

North America > United States > California > Merced County > Merced (0.04)
North America > United States > Tennessee > Anderson County > Oak Ridge (0.04)
Asia > Thailand > Bangkok > Bangkok (0.04)
(2 more...)

Genre: Research Report (0.40)

Industry:

Government > Regional Government > North America Government > United States Government (0.93)
Government > Military (0.85)
Information Technology > Security & Privacy (0.71)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Hosein Hasani, Mahdieh Soleymani, Hamid Aghajan

Surround Modulation: A Bio-inspired Connectivity Structure for Convolutional Neural Networks

Neural Information Processing SystemsAug-20-2025, 01:42:59 GMT

Neural Information Processing Systems http://nips.cc/

modulation, sparsity, visual cortex, (15 more...)

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsAug-14-2025, 17:33:20 GMT

4d893f766ab60e5337659b9e71883af4-Supplemental-Conference.pdf

advantage function, experiment, standard error, (17 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.47)

Premakumar, Vickram N., Vaiana, Michael, Pop, Florin, Rosenblatt, Judd, de Lucena, Diogo Schwerz, Ziman, Kirsten, Graziano, Michael S. A.

Unexpected Benefits of Self-Modeling in Neural Systems

arXiv.org Artificial IntelligenceJul-14-2024

Self-models have been a topic of great interest for decades in studies of human cognition and more recently in machine learning. Yet what benefits do self-models confer? Here we show that when artificial networks learn to predict their internal states as an auxiliary task, they change in a fundamental way. To better perform the self-model task, the network learns to make itself simpler, more regularized, more parameter-efficient, and therefore more amenable to being predictively modeled. To test the hypothesis of self-regularizing through self-modeling, we used a range of network architectures performing three classification tasks across two modalities. In all cases, adding self-modeling caused a significant reduction in network complexity. The reduction was observed in two ways. First, the distribution of weights was narrower when self-modeling was present. Second, a measure of network complexity, the real log canonical threshold (RLCT), was smaller when self-modeling was present. Not only were measures of complexity reduced, but the reduction became more pronounced as greater training weight was placed on the auxiliary task of self-modeling. These results strongly support the hypothesis that self-modeling is more than simply a network learning to predict itself. The learning has a restructuring effect, reducing complexity and increasing parameter efficiency. This self-regularization may help explain some of the benefits of self-models reported in recent machine learning literature, as well as the adaptive value of self-models to biological systems. In particular, these findings may shed light on the possible interaction between the ability to model oneself and the ability to be more easily modeled by others in a social or cooperative context.

baseline network, classification task, complexity, (17 more...)

2407.10188

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
(2 more...)

Genre: Research Report > Experimental Study (0.50)

Industry:

Education (0.48)
Health & Medicine > Therapeutic Area (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Smets, Bart M. N., Donker, Peter D., Portegies, Jim W., Duits, Remco

Semiring Activation in Neural Networks

arXiv.org Artificial IntelligenceJul-5-2024

We introduce a class of trainable nonlinear operators based on semirings that are suitable for use in neural networks. These operators generalize the traditional alternation of linear operators with activation functions in neural networks. Semirings are algebraic structures that describe a generalised notation of linearity, greatly expanding the range of trainable operators that can be included in neural networks. In fact, max- or min-pooling operations are convolutions in the tropical semiring with a fixed kernel. We perform experiments where we replace the activation functions for trainable semiring-based operators to show that these are viable operations to include in fully connected as well as convolutional neural networks (ConvNeXt). We discuss some of the challenges of replacing traditional activation functions with trainable semiring activations and the trade-offs of doing so.

activation function, neural network, operator, (14 more...)

2405.18805

Country:

Europe > Netherlands > North Brabant > Eindhoven (0.05)
North America > United States > Missouri (0.04)
Europe > Sweden > Uppsala County > Uppsala (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Tian, Yongding, Al-Ars, Zaid, Kitsak, Maksim, Hofstee, Peter

Vanishing Variance Problem in Fully Decentralized Neural-Network Systems

arXiv.org Artificial IntelligenceJun-18-2024

Federated learning and gossip learning are emerging methodologies designed to mitigate data privacy concerns by retaining training data on client devices and exclusively sharing locally-trained machine learning (ML) models with others. The primary distinction between the two lies in their approach to model aggregation: federated learning employs a centralized parameter server, whereas gossip learning adopts a fully decentralized mechanism, enabling direct model exchanges among nodes. This decentralized nature often positions gossip learning as less efficient compared to federated learning. Both methodologies involve a critical step: computing a representation of received ML models and integrating this representation into the existing model. Conventionally, this representation is derived by averaging the received models, exemplified by the FedAVG algorithm. Our findings suggest that this averaging approach inherently introduces a potential delay in model convergence. We identify the underlying cause and refer to it as the "vanishing variance" problem, where averaging across uncorrelated ML models undermines the optimal variance established by the Xavier weight initialization. Unlike federated learning where the central server ensures model correlation, and unlike traditional gossip learning which circumvents this problem through model partitioning and sampling, our research introduces a variance-corrected model averaging algorithm. This novel algorithm preserves the optimal variance needed during model averaging, irrespective of network topology or non-IID data distributions. Our extensive simulation results demonstrate that our approach enables gossip learning to achieve convergence efficiency comparable to that of federated learning.

accuracy, learning, plateau delay, (16 more...)