AITopics | initialisation scheme

Collaborating Authors

initialisation scheme

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Orthogonal Self-Attention

Zhang, Leo, Martens, James

arXiv.org Machine LearningFeb-6-2026

Skip connections [He et al., 2016] have become an ubiquitous feature of neural network architectures from facilitating the stable training of deep models. However, despite their success, prior works [Veit et al., 2016, Gromov et al., 2024, Zhang et al., 2024] have raised the concern that the benefits of skip connections, namely ease of training, may be obscuring deeper issues, in terms of representation learning, that skip connections induce. The main point behind these criticisms is that skip connections appear to bias models away from properly utilising the full depth of their architectures. For instance, Ji et al. [2025a] argues that since skip connections continually reintroduce earlier features into deeper layers, they disrupt the learning of hierarchical and progressively more abstract representations, fundamentally harming representation learning. Motivated by this line of reasoning, we explore designing Transformers that are able to be trained stably without skip connections. Previous works [He et al., 2023, Ji et al., 2025a] have tackled this through modifications to Softmax Self-Attention (SSA) [Vaswani et al., 2017] and weight initialisations to improve signal propagation and the conditioning of the Jacobian matrix. However, these works restrict themselves to standard Softmax-based Transformers which appear to be inherently unstable without skip connections [Dong et al., 2021, Ji et al., 2025b] due to SSA.

artificial intelligence, machine learning, matrix, (17 more...)

arXiv.org Machine Learning

2602.05996

Country: Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Approximate Gaussianity Beyond Initialisation in Neural Networks

Hirst, Edward, Ramgoolam, Sanjaye

arXiv.org Artificial IntelligenceOct-8-2025

Ensembles of neural network weight matrices are studied through the training process for the MNIST classification problem, testing the efficacy of matrix models for representing their distributions, under assumptions of Gaussianity and permutation-symmetry. The general 13-parameter permutation invariant Gaussian matrix models are found to be effective models for the correlated Gaussianity in the weight matrices, beyond the range of applicability of the simple Gaussian with independent identically distributed matrix variables, and notably well beyond the initialisation step. The representation theoretic model parameters, and the graph-theoretic characterisation of the permutation invariant matrix observables give an interpretable framework for the best-fit model and for small departures from Gaussianity. Additionally, the Wasserstein distance is calculated for this class of models and used to quantify the movement of the distributions over training. Throughout the work, the effects of varied initialisation regimes, regularisation, layer depth, and layer width are tested for this formalism, identifying limits where particular departures from Gaussianity are enhanced and how more general, yet still highly-interpretable, models can be developed.

artificial intelligence, invariant, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2510.05218

Country:

North America > Canada (0.45)
North America > United States (0.28)

Genre: Research Report (0.81)

Industry: Government (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

e9bf14a419d77534105016f5ec122d62-AuthorFeedback.pdf

Neural Information Processing SystemsAug-20-2025, 07:55:48 GMT

confidence measure, final submission, initialisation scheme, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.34)

Add feedback

Eigenvalue initialisation and regularisation for Koopman autoencoders

Miller, Jack W., O'Neill, Charles, Constantinou, Navid C., Azencot, Omri

arXiv.org Artificial IntelligenceDec-25-2022

Regularising the parameter matrices of neural networks is ubiquitous in training deep models. Typical regularisation approaches suggest initialising weights using small random values, and to penalise weights to promote sparsity. However, these widely used techniques may be less effective in certain scenarios. Here, we study the Koopman autoencoder model which includes an encoder, a Koopman operator layer, and a decoder. These models have been designed and dedicated to tackle physics-related problems with interpretable dynamics and an ability to incorporate physics-related constraints. However, the majority of existing work employs standard regularisation practices. In our work, we take a step toward augmenting Koopman autoencoders with initialisation and penalty schemes tailored for physics-related settings. Specifically, we propose the "eigeninit" initialisation scheme that samples initial Koopman operators from specific eigenvalue distributions. In addition, we suggest the "eigenloss" penalty scheme that penalises the eigenvalues of the Koopman operator during training. We demonstrate the utility of these schemes on two synthetic data sets: a driven pendulum and flow past a cylinder; and two real-world problems: ocean surface temperatures and cyclone wind fields. We find on these datasets that eigenloss and eigeninit improves the convergence rate by up to a factor of 5, and that they reduce the cumulative long-term prediction error by up to a factor of 3. Such a finding points to the utility of incorporating similar schemes as an inductive bias in other physics-related deep learning approaches.

artificial intelligence, eigenvalue, machine learning, (12 more...)

arXiv.org Artificial Intelligence

2212.12086

Country:

Asia > Middle East > Jordan (0.04)
Oceania > Australia > Australian Capital Territory > Canberra (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
(4 more...)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Exploring Low Rank Training of Deep Neural Networks

Kamalakara, Siddhartha Rao, Locatelli, Acyr, Venkitesh, Bharat, Ba, Jimmy, Gal, Yarin, Gomez, Aidan N.

arXiv.org Artificial IntelligenceSep-27-2022

Training deep neural networks in low rank, i.e. with factorised layers, is of particular interest to the community: it offers efficiency over unfactorised training in terms of both memory consumption and training time. Prior work has focused on low rank approximations of pre-trained networks and training in low rank space with additional objectives, offering various ad hoc explanations for chosen practice. We analyse techniques that work well in practice, and through extensive ablations on models such as GPT2 we provide evidence falsifying common beliefs in the field, hinting in the process at exciting research opportunities that still need answering.

artificial intelligence, machine learning, neural network, (15 more...)

arXiv.org Artificial Intelligence

2209.13569

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.42)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Speech Modelling Using Subspace and EM Techniques

Smith, Gavin, Freitas, João F. G. de, Robinson, Tony, Niranjan, Mahesan

Neural Information Processing SystemsDec-31-2000

The speech waveform can be modelled as a piecewise-stationary linear stochastic state space system, and its parameters can be estimated using an expectation-maximisation (EM) algorithm. One problem is the initialisation of the EM algorithm. Standard initialisation schemes can lead to poor formant trajectories. But these trajectories however are important for vowel intelligibility. The aim of this paper is to investigate the suitability of subspace identification methods to initialise EM. The paper compares the subspace state space system identification (4SID) method with the EM algorithm. The 4SID and EM methods are similar in that they both estimate a state sequence (but using Kalman ters fil and Kalman smoothers respectively), and then estimate parameters (but using least-squares and maximum likelihood respectively).

algorithm, formant trajectory, speech modelling, (12 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada > Ontario > Toronto (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.79)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.77)

Add feedback

Speech Modelling Using Subspace and EM Techniques

Smith, Gavin, Freitas, João F. G. de, Robinson, Tony, Niranjan, Mahesan

Neural Information Processing SystemsDec-31-2000

algorithm, formant trajectory, speech modelling, (12 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada > Ontario > Toronto (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.79)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.77)

Add feedback

Speech Modelling Using Subspace and EM Techniques

Smith, Gavin, Freitas, João F. G. de, Robinson, Tony, Niranjan, Mahesan

Neural Information Processing SystemsDec-31-2000

The speech waveform can be modelled as a piecewise-stationary linear stochastic state space system, and its parameters can be estimated using an expectation-maximisation (EM) algorithm. One problem is the initialisation ofthe EM algorithm. Standard initialisation schemes can lead to poor formant trajectories. But these trajectories however are important forvowel intelligibility. The aim of this paper is to investigate the suitability of subspace identification methods to initialise EM. The paper compares the subspace state space system identification (4SID) method with the EM algorithm. The 4SID and EM methods are similar in that they both estimate a state sequence (but using Kalman filters andKalman smoothers respectively), and then estimate parameters (but using least-squares and maximum likelihood respectively).

algorithm, artificial intelligence, machine learning, (14 more...)

Neural Information Processing Systems

Country:

North America > United States (0.29)
Europe > United Kingdom > England (0.15)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.79)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.78)

Add feedback