AITopics

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Virginia (0.04)
North America > United States > Illinois > Champaign County > Urbana (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Security & Privacy (0.93)
Information Technology > Artificial Intelligence > Natural Language (0.93)
(2 more...)

Neural Information Processing SystemsFeb-11-2026, 11:27:15 GMT

0e79548081b4bd0df3c77c5ba2c23289-AuthorFeedback.pdf

dependence, jacobian norm, wideresnet, (15 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.41)

Dadoun, Benjamin, Hayou, Soufiane, Salam, Hanan, Seddik, Mohamed El Amine, Youssef, Pierre

On the Stability of the Jacobian Matrix in Deep Neural Networks

arXiv.org Artificial IntelligenceNov-25-2025

Deep neural networks are known to suffer from exploding or vanishing gradients as depth increases, a phenomenon closely tied to the spectral behavior of the input-output Jacobian. Prior work has identified critical initialization schemes that ensure Jacobian stability, but these analyses are typically restricted to fully connected networks with i.i.d. weights. In this work, we go significantly beyond these limitations: we establish a general stability theorem for deep neural networks that accommodates sparsity (such as that introduced by pruning) and non-i.i.d., weakly correlated weights (e.g. induced by training). Our results rely on recent advances in random matrix theory, and provide rigorous guarantees for spectral stability in a much broader class of network models. This extends the theoretical foundation for initialization schemes in modern neural networks with structured and dependent randomness.

artificial intelligence, machine learning, matrix, (19 more...)

2506.08764

Country:

North America > United States (0.28)
Europe > Austria (0.28)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Mudarisov, Timur, Burtsev, Mikhail, Petrova, Tatiana, State, Radu

Limitations of Normalization in Attention Mechanism

arXiv.org Artificial IntelligenceOct-21-2025

This paper investigates the limitations of the normalization in attention mechanisms. We begin with a theoretical framework that enables the identification of the model's selective ability and the geometric separation involved in token selection. Our analysis includes explicit bounds on distances and separation criteria for token vectors under softmax scaling. Through experiments with pre-trained GPT-2 model, we empirically validate our theoretical results and analyze key behaviors of the attention mechanism. Notably, we demonstrate that as the number of selected tokens increases, the model's ability to distinguish informative tokens declines, often converging toward a uniform selection pattern. We also show that gradient sensitivity under softmax normalization presents challenges during training, especially at low temperature settings. These findings advance current understanding of softmax-based attention mechanism and motivate the need for more robust normalization and selection strategies in future attention architectures.

machine learning, natural language, softmax, (21 more...)

2508.17821

Country: Europe (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.80)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Neural Information Processing SystemsOct-10-2025, 00:51:45 GMT

FedGTST: Boosting Global Transferability of Federated Models via Statistics Tuning

In addition to boosting the performance of a target domain model, TL also reduces the computational cost of fine-tuning the target domain model.

dataset, federated learning, transferability, (16 more...)

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Virginia (0.04)
North America > United States > Illinois > Champaign County > Urbana (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
(2 more...)

Blayney, Hugh, Arroyo, Álvaro, Dong, Xiaowen, Bronstein, Michael M.

gLSTM: Mitigating Over-Squashing by Increasing Storage Capacity

arXiv.org Machine LearningOct-10-2025

Graph Neural Networks (GNNs) leverage the graph structure to transmit information between nodes, typically through the message-passing mechanism. While these models have found a wide variety of applications, they are known to suffer from over-squashing, where information from a large receptive field of node representations is collapsed into a single fixed sized vector, resulting in an information bottleneck. In this paper, we re-examine the over-squashing phenomenon through the lens of model storage and retrieval capacity, which we define as the amount of information that can be stored in a node's representation for later use. We study some of the limitations of existing tasks used to measure over-squashing and introduce a new synthetic task to demonstrate that an information bottleneck can saturate this capacity. Furthermore, we adapt ideas from the sequence modeling literature on associative memories, fast weight programmers, and the xLSTM model to develop a novel GNN architecture with improved capacity. We demonstrate strong performance of this architecture both on our capacity synthetic task, as well as a range of real-world graph benchmarks.

information, jacobian norm, node, (15 more...)

arXiv.org Machine Learning

2510.0845

Country:

North America > United States (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report > New Finding (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Neural Information Processing SystemsOct-2-2025, 02:10:47 GMT

) for fully connected networks trained on MNIST vs. depth

We thank the reviewers for the detailed and insightful reviews. We answer most of the questions and will incorporate the feedbacks into the final version. Right: Log leading terms for spectral vs. our bound on WideResNet trained on CIFAR10 using different depths. In Figure 1, we address questions about empirical evaluation of our bounds. The primary challenge is that Theorem 5.1 requires the augmented indicators on the Jacobian norms to be themselves Lipschitz w.r.t. the hidden layers.

artificial intelligence, jacobian norm, machine learning, (17 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.41)

arXiv.org Artificial IntelligenceDec-30-2024

Towards Unraveling and Improving Generalization in World Models

Fang, Qiaoyi, Du, Weiyu, Wang, Hang, Zhang, Junshan

World models have recently emerged as a promising approach to reinforcement learning (RL), achieving state-of-the-art performance across a wide range of visual control tasks. This work aims to obtain a deep understanding of the robustness and generalization capabilities of world models. Thus motivated, we develop a stochastic differential equation formulation by treating the world model learning as a stochastic dynamical system, and characterize the impact of latent representation errors on robustness and generalization, for both cases with zero-drift representation errors and with non-zero-drift representation errors. Our somewhat surprising findings, based on both theoretic and experimental studies, reveal that for the case with zero drift, modest latent representation errors can in fact function as implicit regularization and hence result in improved robustness. We further propose a Jacobian regularization scheme to mitigate the compounding error propagation effects of non-zero drift, thereby enhancing training stability and robustness. Our experimental studies corroborate that this regularization approach not only stabilizes training but also accelerates convergence and improves accuracy of long-horizon prediction.

artificial intelligence, deep learning, machine learning, (18 more...)

2501.00195

Country: North America > United States (0.67)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

arXiv.org Artificial IntelligenceOct-16-2024

FedGTST: Boosting Global Transferability of Federated Models via Statistics Tuning

Ma, Evelyn, Pan, Chao, Etesami, Rasoul, Zhao, Han, Milenkovic, Olgica

The performance of Transfer Learning (TL) heavily relies on effective pretraining, which demands large datasets and substantial computational resources. As a result, executing TL is often challenging for individual model developers. Federated Learning (FL) addresses these issues by facilitating collaborations among clients, expanding the dataset indirectly, distributing computational costs, and preserving privacy. However, key challenges remain unresolved. First, existing FL methods tend to optimize transferability only within local domains, neglecting the global learning domain. Second, most approaches rely on indirect transferability metrics, which do not accurately reflect the final target loss or true degree of transferability. To address these gaps, we propose two enhancements to FL. First, we introduce a client-server exchange protocol that leverages cross-client Jacobian (gradient) norms to boost transferability. Second, we increase the average Jacobian norm across clients at the server, using this as a local regularizer to reduce cross-client Jacobian variance. Our transferable federated algorithm, termed FedGTST (Federated Global Transferability via Statistics Tuning), demonstrates that increasing the average Jacobian and reducing its variance allows for tighter control of the target loss. This leads to an upper bound on the target loss in terms of the source loss and source-target domain discrepancy. Extensive experiments on datasets such as MNIST to MNIST-M and CIFAR10 to SVHN show that FedGTST outperforms relevant baselines, including FedSR. On the second dataset pair, FedGTST improves accuracy by 9.8% over FedSR and 7.6% over FedIIR when LeNet is used as the backbone.

artificial intelligence, machine learning, transferability, (17 more...)

2410.13045

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Illinois (0.05)
North America > United States > Virginia (0.04)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

MacDonald, Lachlan Ewen, Valmadre, Jack, Lucey, Simon

On progressive sharpening, flat minima and generalisation

arXiv.org Machine LearningSep-26-2023

We present a new approach to understanding the relationship between loss curvature and input-output model behaviour in deep learning. Specifically, we use existing empirical analyses of the spectrum of deep network loss Hessians to ground an ansatz tying together the loss Hessian and the input-output Jacobian over training samples during the training of deep neural networks. We then prove a series of theoretical results which quantify the degree to which the input-output Jacobian of a model approximates its Lipschitz norm over a data distribution, and deduce a novel generalisation bound in terms of the empirical Jacobian. We use our ansatz, together with our theoretical results, to give a new account of the recently observed progressive sharpening phenomenon, as well as the generalisation properties of flat minima. Experimental evidence is provided to validate our claims.

artificial intelligence, jacobian norm, machine learning, (17 more...)

arXiv.org Machine Learning

2305.14683

Country:

Oceania > Australia > South Australia > Adelaide (0.04)
North America > United States > Maryland > Baltimore (0.04)

Genre: Research Report > New Finding (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)