AITopics | layer weight

Collaborating Authors

layer weight

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Path Regularization: A Convexity and Sparsity Inducing Regularization for Parallel ReLU Networks

Neural Information Processing SystemsFeb-16-2026, 19:40:47 GMT

Understanding the fundamental principles behind the success of deep neural networks is one of the most important open questions in the current literature.

artificial intelligence, ij 1, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Canada > Ontario > Toronto (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Path Regularization: A Convexity and Sparsity Inducing Regularization for Parallel ReLU Networks

Neural Information Processing SystemsOct-9-2025, 06:04:07 GMT

Understanding the fundamental principles behind the success of deep neural networks is one of the most important open questions in the current literature.

artificial intelligence, ij 1, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Canada > Ontario > Toronto (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

A Recovery Guarantee for Sparse Neural Networks

Fridovich-Keil, Sara, Pilanci, Mert

arXiv.org Machine LearningSep-25-2025

We prove the first guarantees of sparse recovery for ReLU neural networks, where the sparse network weights constitute the signal to be recovered. Specifically, we study structural properties of the sparse network weights for two-layer, scalar-output networks under which a simple iterative hard thresholding algorithm recovers these weights exactly, using memory that grows linearly in the number of nonzero weights. We validate this theoretical result with simple experiments on recovery of sparse planted MLPs, MNIST classification, and implicit neural representations. Experimentally, we find performance that is competitive with, and often exceeds, a high-performing but memory-inefficient baseline based on iterative magnitude pruning.

assumption 2, mlp, probability, (17 more...)

arXiv.org Machine Learning

2509.20323

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Depth-Aware Initialization for Stable and Efficient Neural Network Training

Pandey, Vijay

arXiv.org Artificial IntelligenceSep-8-2025

In past few years, various initialization schemes have been proposed. These schemes are glorot initialization, He initialization, initialization using orthogonal matrix, random walk method for initialization. Some of these methods stress on keeping unit variance of activation and gradient propagation through the network layer . Few of these methods are independent of the depth information while some methods has considered the total network depth for better initialization. In this paper, comprehensive study has been done where depth information of each layer as well as total network is incorporated for better initialization scheme. It has also been studied that for deeper networks theoretical assumption of unit variance throughout the network does not perform well. It requires the need to increase the variance of the network from first layer activation to last layer activation. W e proposed a novel way to increase the variance of the network in flexible manner, which incorporates the information of each layer depth. Experiments shows that proposed method performs better than the existing initialization scheme.

artificial intelligence, machine learning, variance, (15 more...)

arXiv.org Artificial Intelligence

2509.05018

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Two-Stage Regularization-Based Structured Pruning for LLMs

Feng, Mingkuan, Wu, Jinyang, Liu, Siyuan, Zhang, Shuai, Jin, Ruihan, Che, Feihu, Shao, Pengpeng, Wen, Zhengqi, Tao, Jianhua

arXiv.org Artificial IntelligenceJul-2-2025

The deployment of large language models (LLMs) is largely hindered by their large number of parameters. Structural pruning has emerged as a promising solution. Prior structured pruning methods directly remove unimportant parameters based on certain metrics, which often causes knowledge loss and necessitates extensive retraining. To overcome this, we introduce a novel pruning method TRSP: Two-Stage Regularization-Based Structured Pruning for LLMs. Specifically, we multiply the output of each transformer layer by an initial learnable weight and iteratively learn these weights by adding their $\ell_1$-norm as a regularization term to the loss function, serving as the first-stage regularization. Subsequently, we apply additional regularization to the difference between the output and input of layers with smaller weights, encouraging the shift of knowledge to the preserved layers. This serves as the second-stage regularization. TRSP retains more knowledge and better preserves model performance than direct parameter elimination. Through extensive experimentation we show that TRSP outperforms strong layer-wise structured pruning methods without requiring retraining. As a layer-wise pruning method, it delivers notable end-to-end acceleration, making it a promising solution for efficient LLM deployment.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2505.18232

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)

Add feedback

Dynamical Decoupling of Generalization and Overfitting in Large Two-Layer Networks

Montanari, Andrea, Urbani, Pierfrancesco

arXiv.org Machine LearningFeb-28-2025

The inductive bias and generalization properties of large machine learning models are -- to a substantial extent -- a byproduct of the optimization algorithm used for training. Among others, the scale of the random initialization, the learning rate, and early stopping all have crucial impact on the quality of the model learnt by stochastic gradient descent or related algorithms. In order to understand these phenomena, we study the training dynamics of large two-layer neural networks. We use a well-established technique from non-equilibrium statistical physics (dynamical mean field theory) to obtain an asymptotic high-dimensional characterization of this dynamics. This characterization applies to a Gaussian approximation of the hidden neurons non-linearity, and empirically captures well the behavior of actual neural network models. Our analysis uncovers several interesting new phenomena in the training dynamics: $(i)$ The emergence of a slow time scale associated with the growth in Gaussian/Rademacher complexity; $(ii)$ As a consequence, algorithmic inductive bias towards small complexity, but only if the initialization has small enough complexity; $(iii)$ A separation of time scales between feature learning and overfitting; $(iv)$ A non-monotone behavior of the test error and, correspondingly, a `feature unlearning' phase at large times.

dynamical regime, equation, initialization, (14 more...)

arXiv.org Machine Learning

2502.21269

Country:

North America (0.14)
Europe > France (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(3 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

On the Interplay Between Sparsity and Training in Deep Reinforcement Learning

Davelouis, Fatima, Martin, John D., Bowling, Michael

arXiv.org Artificial IntelligenceFeb-1-2025

We study the benefits of different sparse architectures for deep reinforcement learning. In particular, we focus on image-based domains where spatially-biased and fully-connected architectures are common. Using these and several other architectures of equal capacity, we show that sparse structure has a significant effect on learning performance. We also observe that choosing the best sparse architecture for a given domain depends on whether the hidden layer weights are fixed or learned.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2501.16729

Country:

North America > Canada > Alberta (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Italy > Campania > Naples (0.04)
Asia > China > Tianjin Province > Tianjin (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Reviews: Towards Understanding the Importance of Shortcut Connections in Residual Networks

Neural Information Processing SystemsJan-24-2025, 21:18:00 GMT

The paper investigates the outcome of training a one hidden layer convolutional residual network architecture using gradient descent when input is sampled from standard Gaussian distribution. As a followup of a similar analysis of Du et al (2017) for CNNs, this paper shows for ResNets that there exists two fixed points to the teacher-student loss function (network architecture is same for both). While one is a global minimum, the other is a spurious fixed point. The authors then derive *sufficient* conditions on the parameter initialization and learning rates such that training happens in two phases: 1. first phase where the hidden layer weights (w) remain away from the spurious fixed point (due to sufficiently small learning rate) while the last layer weights (a) approach the optimal value and eventually enter the region where the inner product satisfies a'a* 0. 2. second phase in which both parameters approach the global minimum such that the learning rate for w can be larger allowing faster convergence. I find this paper to be very interesting as it provides novel insights into the optimization process of ResNets even though in a very restricted setting.

global minimum, residual network, shortcut connection, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

RelChaNet: Neural Network Feature Selection using Relative Change Scores

Zimmer, Felix

arXiv.org Artificial IntelligenceOct-3-2024

There is an ongoing effort to develop feature selection algorithms to improve interpretability, reduce computational resources, and minimize overfitting in predictive models. Neural networks stand out as architectures on which to build feature selection methods, and recently, neuron pruning and regrowth have emerged from the sparse neural network literature as promising new tools. We introduce RelChaNet, a novel and lightweight feature selection algorithm that uses neuron pruning and regrowth in the input layer of a dense neural network. For neuron pruning, a gradient sum metric measures the relative change induced in a network after a feature enters, while neurons are randomly regrown. We also propose an extension that adapts the size of the input layer at runtime. Extensive experiments on nine different datasets show that our approach generally outperforms the current state-of-the-art methods, and in particular improves the average accuracy by 2% on the MNIST dataset. Feature selection is an elemental task in predictive modelling. It can serve to reduce computational resources, improve interpretability by highlighting important features, or improve predictive performance by reducing overfitting (Li et al., 2018). To further these goals has been the driving motivation of large recent efforts to improve existing and develop new feature selection algorithms.

dataset, neural network, neuron, (15 more...)

arXiv.org Artificial Intelligence

2410.02344

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Virginia > Arlington County > Arlington (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Genre: Research Report > Promising Solution (0.34)

Industry:

Health & Medicine > Therapeutic Area (0.46)
Telecommunications > Networks (0.40)
Information Technology > Networks (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Rethinking the adaptive relationship between Encoder Layers and Decoder Layers

Song, Yubo

arXiv.org Artificial IntelligenceMay-14-2024

In the field of machine learning, using pre-trained models to perform specific tasks is a common practice. Typically, this involves fine-tuning the pre-trained model on a specific dataset through iterative adjustments without modifying the model structure. This article focuses on the state-of-the-art (SOTA) machine translation model Helsinki-NLP/opus-mtde-en, which translates German to English, to explore the adaptive relationship between Encoder Layers and Decoder Layers by introducing a bias-free fully connected layer. Additionally, the study investigates the effects of modifying the pre-trained model structure during fine-tuning. Four experiments were conducted by introducing a bias-free fully connected layer between the Encoder and Decoder Layers: Using original pre-trained model weights and initializing the fully connected layer weights to maintain the original connections, where each Decoder Layer's input is from the 6th Encoder Layer. Through fine-tuning, these weights adapt towards optimal configurations.

adaptive relationship, decoder layer, encoder layer, (15 more...)

arXiv.org Artificial Intelligence

2405.0857

Country:

Europe > Finland > Uusimaa > Helsinki (0.26)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback