AITopics | symmetry

Collaborating Authors

symmetry

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Symmetry-Compatible Principle for Optimizer Design: Embeddings, LM Heads, SwiGLU MLPs, and MoE Routers

Lau, Tim Tsz-Kit, Su, Weijie

arXiv.org Machine LearningMay-27-2026

A striking geometric disparity has long persisted in the practice of deep learning. While modern neural network architectures naturally exhibit rich symmetry and equivariance properties, popular optimizers such as Adam and its variants operate inherently coordinate-wise, rendering them unable to respect the equivariance structures of the parameter space. We address this disparity by introducing a symmetry-compatible principle for optimizer design: the gradient update rule should be equivariant under the symmetry group acting on the corresponding weight block. Following this principle, we first provide a unified perspective on bi-orthogonally equivariant updates for general matrix layers, as employed by stochastic spectral descent, Muon, Scion, and polar gradient methods. More importantly, by moving from orthogonal groups to permutation and shared-shift symmetries, we derive symmetry-compatible optimizers for parameter blocks whose symmetries differ from those of general matrix layers: embedding and LM head matrices, SwiGLU MLP projections, and MoE router matrices. These constructions include one-sided spectral, row-norm, hybrid row-norm/spectral, row-aware, column-aware, centered row-norm, and left-spectral updates. They yield an end-to-end layerwise optimizer stack in which each major matrix-valued parameter class is assigned an update whose equivariance matches its symmetry group. We corroborate this principle through pre-training experiments on dense and sparse MoE language models, including Qwen3-0.6B-style, Gemma 3 1B-style, OLMoE-1B-7B-style, and downsized gpt-oss architectures. Across these experiments, symmetry-compatible update rules consistently improve final validation loss, reduce load imbalance in sparse MoE models, and in several cases improve training stability over the corresponding AdamW updates.

large language model, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

2605.18106

Country: North America > United States > Pennsylvania (0.27)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

On the Epistemic Uncertainty of Overparametrized Neural Networks

Rügamer, David

arXiv.org Machine LearningMay-26-2026

Epistemic uncertainty is often viewed as a reducible uncertainty that vanishes with increasing data. This perspective implicitly assumes parameter identifiability and equates epistemic uncertainty with predictive variability. In overparametrized neural networks, however, model parameters are typically non-identifiable due to symmetries and redundant representations. As a consequence, substantial parameter uncertainty can persist even when the underlying function is fully identified. In this work, we analyze epistemic uncertainty through the lens of non-identifiability and characterize both discrete and continuous sources of residual uncertainty. Focusing on one-hidden-layer ReLU networks, we thoroughly analyze the resulting posterior structure and validate our theoretical insights through empirical studies.

artificial intelligence, epistemic uncertainty, machine learning, (15 more...)

arXiv.org Machine Learning

2605.25234

Country: Europe (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)

Add feedback

The Attribution Impossibility: No Feature Ranking Is Faithful, Stable, and Complete Under Collinearity

Caraker, Drake, Arnold, Bryan, Rhoads, David

arXiv.org Machine LearningMay-22-2026

No feature ranking can be simultaneously faithful, stable, and complete when features are collinear. For collinear pairs, ranking reduces to a coin flip. We prove this impossibility, quantify it for four model classes, resolve it via ensemble averaging (DASH), and machine-verify it with 305 Lean 4 theorems. We characterize the complete attribution design space: exactly two families of methods exist -- faithful-complete methods (unstable, with rankings that flip up to 50% of the time) and ensemble methods like DASH (stable, reporting ties for symmetric features) -- and no method lies outside this dichotomy. The impossibility is quantitative: the attribution ratio diverges as 1/(1-rho^2) for gradient boosting, is infinite for Lasso, and converges for random forests. DASH (Diversified Aggregation of SHAP) is provably Pareto-optimal among unbiased aggregations, achieving the Cramer-Rao variance bound with a tight ensemble size formula. In a survey of 77 public datasets, 68% exhibit attribution instability. Switching to conditional SHAP does not escape the impossibility when features have equal causal effects. The framework includes practical diagnostics -- a Z-test workflow and single-model screening tool -- and has direct consequences for fairness auditing: SHAP-based proxy discrimination audits are provably unreliable under collinearity. The design space theorem, diagnostics, and impossibility are mechanically verified in Lean 4 (305 theorems from 16 axioms, 0 sorry) -- to our knowledge, the first formally verified impossibility in explainable AI.

artificial intelligence, instability, machine learning, (13 more...)

arXiv.org Machine Learning

doi: 10.5281/zenodo.19468379

2605.21492

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.45)

Industry:

Banking & Finance (0.67)
Health & Medicine > Therapeutic Area (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.86)

Add feedback

Representation Gap: Explaining the Unreasonable Effectiveness of Neural Networks from a Geometric Perspective

Perera, David, Moura, Victor, Santos, Lais Isabelle Alves dos, Haddad, Michel F. C., Figueiredo, Flavio

arXiv.org Machine LearningMay-22-2026

Characterizing precisely the asymptotic generalization error of neural networks using parameters that can be estimated efficiently is a crucial problem in machine learning, which relies heavily on heuristics and practitioners' intuition to make key design choices. In order to mitigate this issue, we introduce the Representation Gap, a metric closely related to the generalization error, but admitting better-behaved asymptotic dynamics. Focusing on equivariant diffusion models and leveraging results from optimal quantization and point-process theory, we derive a precise asymptotic equivalent of the Representation Gap and show that it is governed by a single parameter, the \textit{intrinsic dimension} of the task, which is easy to interpret, efficient to estimate, and can be linked to the equivariances of common neural network architectures. We show that this asymptotic dynamic also extends to a broader range of tasks and training algorithms. Finally, we demonstrate empirically that our asymptotic law and intrinsic dimension estimation are accurate on a wide range of synthetic datasets, where these quantities are known, as well as on more realistic datasets, where we obtain results consistent with the related literature.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Machine Learning

2605.21692

Country:

North America (0.46)
Europe > United Kingdom (0.28)
South America > Brazil > Minas Gerais (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Finite-size scaling of hetero-associative retrieval in continuous-signal-driven Ising spin systems

Ladiana, Andrea

arXiv.org Machine LearningMay-15-2026

Kosko's Bidirectional Associative Memory [17] first formalised this idea for two layers, showing that stable recallContent-addressable memory--the recovery of a complete stored record from a partial or degraded cue--is aarises from the same energy-descent principle as in Hopcornerstone of neural computation and a paradigmaticfield networks but across two distinct pattern spaces: a problem in the statistical mechanics of disordered sys-cue presented to one layer drives the other toward the tems. The Hopfield model [1] demonstrated that binarymatching stored pattern, enabling cross-modal compleNtion. Multi-species spin-glass analyses [18] subsequentlypatterns in { 1,+1} can be stored as fixed-point attractors of an energy landscape shaped by Hebbian couplings, provided a rigorous thermodynamic foundation for arwhile Little's earlier stochastic formulation [2] cast thechitectures with an arbitrary number of interacting popsame architecture in the language of equilibrium statisti-ulations, generalising the classical single-species phase cal mechanics through parallel probabilistic updates.

archetype, artificial intelligence, machine learning, (19 more...)

arXiv.org Machine Learning

2605.14059

Genre: Research Report (0.64)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Uncovering Symmetry Transfer in Large Language Models via Layer-Peeled Optimization

Du, Zhehang, He, Hangfeng, Su, Weijie

arXiv.org Machine LearningMay-14-2026

Large language models (LLMs) are pretrained by minimizing the cross-entropy loss for next-token prediction. In this paper, we study whether this optimization strategy can induce geometric structure in the learned model weights and context embeddings. We approach this problem by analyzing a constrained layer-peeled optimization program, which serves as a mathematically tractable surrogate for LLMs by treating the output projection matrix and last-layer context embeddings as optimization variables. Our analysis of this nonconvex optimization program demonstrates that symmetries in the target next-token distributions are transferred to the global minimizers of the layer-peeled model in a precise group-theoretic sense. Specifically, we prove that when the target tokens exhibit a cyclic-shift symmetry (such as the seven days of the week or the twelve months of the year), the optimal logit matrix is exactly circulant, and the Gram matrices of both the output projections and the context embeddings form circulant geometries as well. Next, for exchangeable target distributions invariant under the symmetric group and, more generally, under two-transitive group actions, we show that the global optimal output projection matrix forms a simplex equiangular tight frame, while the optimal logit matrix and context embeddings inherit the permutation symmetries present in the input data. A key technical step is to reduce the constrained nonconvex factorized problem to an explicit logit-level convex characterization for cyclic symmetry and to a symmetry-based lower bound for permutation symmetry, together with a sharp characterization of the optimal factorization. Finally, we empirically demonstrate that open-source LLMs naturally exhibit symmetries consistent with our theoretical predictions, despite being trained without any explicit regularization promoting such geometric structure.

large language model, machine learning, natural language, (18 more...)

arXiv.org Machine Learning

2605.12756

Country:

Asia (1.00)
North America > United States > New York (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Muon is Not That Special: Random or Inverted Spectra Work Just as Well

Shumaylov, Zakhar, Da Costa, Nathaël, Zaika, Peter, Mucsányi, Bálint, Massucco, Alex, Gelberg, Yoav, Schönlieb, Carola-Bibiane, Gal, Yarin, Hennig, Philipp

arXiv.org Machine LearningMay-13-2026

The recent empirical success of the Muon optimizer has renewed interest in non-Euclidean optimization, typically justified by similarities with second-order methods, and linear minimization oracle (LMO) theory. In this paper, we challenge this geometric narrative through three contributions, demonstrating that precise geometric structure is not the key factor affecting optimization performance. First, we introduce Freon, a family of optimizers based on Schatten (quasi-)norms, powered by a novel, provably optimal QDWH-based iterative approximation. Freon naturally interpolates between SGD and Muon, while smoothly extrapolating into the quasi-norm regime. Empirically, the best-performing Schatten parameters for GPT-2 lie strictly within the quasi-norm regime, and thus cannot be represented by any unitarily invariant LMO. Second, noting that Freon performs well across a wide range of exponents, we introduce Kaon, an absurd optimizer that replaces singular values with random noise. Despite lacking any coherent geometric structure, Kaon matches Muon's performance and retains classical convergence guarantees, proving that strict adherence to a precise geometry is practically irrelevant. Third, having shown that geometry is not the primary driver of performance, we demonstrate it is instead controlled by two local quantities: alignment and descent potential. Ultimately, each optimizer must tune its step size around these two quantities. While their dynamics are difficult to predict a-priori, evaluating them within a stochastic random feature model yields a precise insight: Muon succeeds not by tracking an ideal global geometry, but by guaranteeing step-size optimality.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

2605.11181

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Universal Feature Selection with Noisy Observations and Weak Symmetry Conditions

Tang, Dier, Han, Guangyue

arXiv.org Machine LearningMay-12-2026

This paper relaxes the restrictive symmetry conditions adopted in [4], [5] and extends their universal feature selection framework to accommodate noisy observations as well as attribute structures that may exhibit directional preferences. We introduce the notion of weak spherical symmetry, quantified by second-moment distances, which allows controlled deviations from rotational invariance. Under this relaxed condition, we develop a universal feature selection framework based on the singular value decomposition of the canonical dependence matrix computed from noisy data. Our main result shows that the selected features achieve asymptotically optimal error exponents up to a residual term that depends on the symmetry deviation $δ$ and the noise levels $η_1, η_2$. When $δ, η_1, η_2$ are relatively small, our result recovers that of [5], thereby demonstrating that exact spherical symmetry is unnecessary. Overall, our findings highlight the robustness of the selection framework against second-moment deviations and observation noise, thereby broadening its applicability across diverse inference tasks and providing a theoretically grounded tool for universal feature selection in practical scenarios.

artificial intelligence, machine learning, symmetry, (13 more...)

arXiv.org Machine Learning

2605.09396

Country: Asia > China (0.15)

Genre: Research Report > New Finding (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

d0da30e312b75a3fffd9e9191f8bc1b0-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-29-2026, 20:37:04 GMT

artificial intelligence, machine learning, sampler, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Filters

Collaborating Authors

symmetry

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Symmetry-Compatible Principle for Optimizer Design: Embeddings, LM Heads, SwiGLU MLPs, and MoE Routers

On the Epistemic Uncertainty of Overparametrized Neural Networks

The Attribution Impossibility: No Feature Ranking Is Faithful, Stable, and Complete Under Collinearity

Representation Gap: Explaining the Unreasonable Effectiveness of Neural Networks from a Geometric Perspective

Finite-size scaling of hetero-associative retrieval in continuous-signal-driven Ising spin systems

Uncovering Symmetry Transfer in Large Language Models via Layer-Peeled Optimization

Muon is Not That Special: Random or Inverted Spectra Work Just as Well

Universal Feature Selection with Noisy Observations and Weak Symmetry Conditions

d9dc5573f7368201d6409e07e882aa77-Supplemental-Conference.pdf

d0da30e312b75a3fffd9e9191f8bc1b0-Supplemental-Conference.pdf