AITopics | linear mode connectivity

Collaborating Authors

linear mode connectivity

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Going Beyond Linear Mode Connectivity: The Layerwise Linear Feature Connectivity Zhanpeng Zhou 1, Yongyi Y ang 2, Xiaojiang Y ang 1, Junchi Y an

Neural Information Processing SystemsFeb-16-2026, 21:20:50 GMT

Furthermore, we delve deeper into the underlying factors contributing to LLFC, which reveal new insights into the permutation approaches.

artificial intelligence, dataset, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Diego County > San Diego (0.04)
Europe > Latvia > Lubāna Municipality > Lubāna (0.04)
Asia > China > Shanghai > Shanghai (0.04)
(2 more...)

Genre: Research Report (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Cycle-Consistent Multi-Model Merging

Neural Information Processing SystemsFeb-10-2026, 21:02:12 GMT

To this end, we follow Ainsworth et al.

artificial intelligence, machine learning, permutation, (18 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
Europe > Latvia > Lubāna Municipality > Lubāna (0.04)
Europe > Austria (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Add feedback

Going Beyond Linear Mode Connectivity: The Layerwise Linear Feature Connectivity

Neural Information Processing SystemsDec-26-2025, 16:31:23 GMT

Recent work has revealed many intriguing empirical phenomena in neural network training, despite the poorly understood and highly complex loss landscapes and training dynamics. One of these phenomena, Linear Mode Connectivity (LMC), has gained considerable attention due to the intriguing observation that different solutions can be connected by a linear path in the parameter space while maintaining near-constant training and test losses. In this work, we introduce a stronger notion of linear connectivity, Layerwise Linear Feature Connectivity (LLFC), which says that the feature maps of every layer in different trained networks are also linearly connected. We provide comprehensive empirical evidence for LLFC across a wide range of settings, demonstrating that whenever two trained networks satisfy LMC (via either spawning or permutation methods), they also satisfy LLFC in nearly all the layers. Furthermore, we delve deeper into the underlying factors contributing to LLFC, which reveal new insights into the permutation approaches.

connectivity, layerwise linear feature connectivity, linear mode connectivity, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

The Empirical Impact of Neural Parameter Symmetries, or Lack Thereof

Neural Information Processing SystemsDec-24-2025, 20:34:10 GMT

Many algorithms and observed phenomena in deep learning appear to be affected by parameter symmetries --- transformations of neural network parameters that do not change the underlying neural network function. These include linear mode connectivity, model merging, Bayesian neural network inference, metanetworks, and several other characteristics of optimization or loss-landscapes. However, theoretical analysis of the relationship between parameter space symmetries and these phenonmena is difficult. In this work, we empirically investigate the impact of neural parameter symmetries by introducing new neural network architectures that have reduced parameter space symmetries. We develop two methods, with some provable guarantees, of modifying standard neural networks to reduce parameter space symmetries. With these new methods, we conduct a comprehensive experimental study consisting of multiple tasks aimed at assessing the effect of removing parameter symmetries. Our experiments reveal several interesting observations on the empirical impact of parameter symmetries; for instance, we observe linear mode connectivity between our networks without alignment of weight spaces, and we find that our networks allow for faster and more effective Bayesian neural network training.

artificial intelligence, machine learning, symmetry, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

On Linear Mode Connectivity of Mixture-of-Experts Architectures

Tran, Viet-Hoang, Trinh, Van Hoan, Bui, Khanh Vinh, Nguyen, Tan M.

arXiv.org Artificial IntelligenceOct-28-2025

Linear Mode Connectivity (LMC) is a notable phenomenon in the loss landscapes of neural networks, wherein independently trained models have been observed to be connected--up to permutation symmetries--by linear paths in parameter space along which the loss remains consistently low. This observation challenges classical views of non-convex optimization and has implications for model ensembling, generalization, and our understanding of neural loss geometry. Inspired by recent studies on LMC in standard neural networks, we systematically investigate this phenomenon within Mixture-of-Experts (MoE) architectures--a class of models known for their scalability and computational efficiency, which combine traditional neural networks--referred to as experts--through a learnable gating mechanism. We begin by conducting a comprehensive analysis of both dense and sparse gating regimes, demonstrating that the symmetries inherent to MoE architectures are fully characterized by permutations acting on both the expert components and the gating function. Building on these foundational findings, we propose a matching algorithm that enables alignment between independently trained MoEs, thereby facilitating the discovery of LMC. Finally, we empirically validate the presence of LMC using our proposed algorithm across diverse MoE configurations--including dense, sparse, and shared-expert variants--under a wide range of model settings and datasets of varying scales and modalities. Our results confirm the existence of LMC in MoE architectures and offer fundamental insights into the functional landscape and optimization dynamics of deep learning models.

artificial intelligence, linear mode connectivity, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2509.11348

Country:

Asia > Singapore (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
(5 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Government (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Robustness and Regularization in Hierarchical Re-Basin

Franke, Benedikt, Heinrich, Florian, Lange, Markus, Raulf, Arne

arXiv.org Artificial IntelligenceOct-14-2025

This paper takes a closer look at Git Re-Basin, an interesting new approach to merge trained models. We propose a hierarchical model merging scheme that significantly outperforms the standard MergeMany algorithm. With our new algorithm, we find that Re-Basin induces adversarial and perturbation robustness into the merged models, with the effect becoming stronger the more models participate in the hierarchical merging scheme. However, in our experiments Re-Basin induces a much bigger performance drop than reported by the original authors.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2510.09174

Country: Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.69)

Add feedback

Do We Really Need Permutations? Impact of Width Expansion on Linear Mode Connectivity

Ito, Akira, Yamada, Masanori, Chijiwa, Daiki, Kumagai, Atsutoshi

arXiv.org Artificial IntelligenceOct-10-2025

Recently, Ainsworth et al. empirically demonstrated that, given two independently trained models, applying a parameter permutation that preserves the input-output behavior allows the two models to be connected by a low-loss linear path. When such a path exists, the models are said to achieve linear mode connectivity (LMC). Prior studies, including Ainsworth et al., have reported that achieving LMC requires not only an appropriate permutation search but also sufficiently wide models (e.g., a 32 $\times$ width multiplier for ResNet-20). This is broadly believed to be because increasing the model width ensures a large enough space of candidate permutations, increasing the chance of finding one that yields LMC. In this work, we empirically demonstrate that, even without any permutations, simply widening the models is sufficient for achieving LMC when using a suitable softmax temperature calibration. We further explain why this phenomenon arises by analyzing intermediate layer outputs. Specifically, we introduce layerwise exponentially weighted connectivity (LEWC), which states that the output of each layer of the merged model can be represented as an exponentially weighted sum of the outputs of the corresponding layers of the original models. Consequently the merged model's output matches that of an ensemble of the original models, which facilitates LMC. To the best of our knowledge, this work is the first to show that widening the model not only facilitates nonlinear mode connectivity, as suggested in prior research, but also significantly increases the possibility of achieving linear mode connectivity.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2510.08023

Country:

Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Singapore (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

3268f1e2474ef9d1af7f034401197a7f-Paper-Conference.pdf

Neural Information Processing SystemsOct-9-2025, 22:46:35 GMT

experiment, mode connectivity, permutation, (14 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
Europe > Latvia > Lubāna Municipality > Lubāna (0.04)
Europe > Austria (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Add feedback

Going Beyond Linear Mode Connectivity: The Layerwise Linear Feature Connectivity Zhanpeng Zhou 1, Yongyi Y ang 2, Xiaojiang Y ang 1, Junchi Y an

Neural Information Processing SystemsOct-9-2025, 06:22:31 GMT

Furthermore, we delve deeper into the underlying factors contributing to LLFC, which reveal new insights into the permutation approaches.

artificial intelligence, dataset, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Diego County > San Diego (0.04)
Europe > Latvia > Lubāna Municipality > Lubāna (0.04)
Asia > China > Shanghai > Shanghai (0.04)
(2 more...)

Genre: Research Report (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Generalized Linear Mode Connectivity for Transformers

Theus, Alexander, Cabodi, Alessandro, Anagnostidis, Sotiris, Orvieto, Antonio, Singh, Sidak Pal, Boeva, Valentina

arXiv.org Machine LearningJul-1-2025

Understanding the geometry of neural network loss landscapes is a central question in deep learning, with implications for generalization and optimization. A striking phenomenon is linear mode connectivity (LMC), where independently trained models can be connected by low- or zero-loss paths, despite appearing to lie in separate loss basins. However, this is often obscured by symmetries in parameter space -- such as neuron permutations -- which make functionally equivalent models appear dissimilar. Prior work has predominantly focused on neuron re-ordering through permutations, but such approaches are limited in scope and fail to capture the richer symmetries exhibited by modern architectures such as Transformers. In this work, we introduce a unified framework that captures four symmetry classes: permutations, semi-permutations, orthogonal transformations, and general invertible maps -- broadening the set of valid reparameterizations and subsuming many previous approaches as special cases. Crucially, this generalization enables, for the first time, the discovery of low- and zero-barrier linear interpolation paths between independently trained Vision Transformers and GPT-2 models. These results reveal deeper structure in the loss landscape and underscore the importance of symmetry-aware analysis for understanding model space geometry.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Machine Learning

2506.22712

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
Asia > Middle East > Jordan (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback