AITopics | representation collapse

Collaborating Authors

representation collapse

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

What Happens During the Loss Plateau Understanding Abrupt Learning in Transformers

Neural Information Processing SystemsJun-17-2026, 05:21:28 GMT

Training Transformers on algorithmic tasks frequently demonstrates an intriguing abrupt learning phenomenon: an extended performance plateau followed by a sudden, sharp improvement. This work investigates the underlying mechanisms for such dynamics, primarily in shallow Transformers. We reveal that during the plateau, the model often develops an interpretable partial solution while simultaneously exhibiting a strong repetition bias in their outputs. This output degeneracy is accompanied by internal representation collapse, where hidden states across different tokens become nearly parallel. We further identify the slow learning of optimal attention maps as a key bottleneck. Hidden progress in attention configuration during the plateau precedes the eventual rapid convergence, and directly intervening on attention significantly alters plateau duration and the severity of repetition bias and representational collapse. We validate that these identified phenomena--repetition bias and representation collapse--are not artifacts of toy setups but also manifest in the early pre-training stage of large language models like Pythia and OLMo.

artificial intelligence, machine learning, natural language, (15 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Education > Educational Setting > Online (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

454cecc4829279e64d624cd8a8c9ddf1-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 16:05:06 GMT

However, in domains where precise and succinct expert state information is available, agents trained onsuchexpert state features usually outperform agents trained onrichobservations.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)

Add feedback

Elliptical Attention

Neural Information Processing SystemsMar-22-2026, 10:38:12 GMT

Pairwise dot-product self-attention is key to the success of transformers that achieve state-of-the-art performance across a variety of applications in language and vision. This dot-product self-attention computes attention weights among the input tokens using Euclidean distance, which makes the model prone to representation collapse and vulnerable to contaminated samples. In this paper, we propose using a Mahalanobis distance metric for computing the attention weights to stretch the underlying feature space in directions of high contextual relevance. In particular, we define a hyper-ellipsoidal neighborhood around each query to increase the attention weights of the tokens lying in the contextually important directions.

artificial intelligence, machine learning, proceedings, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.77)

Add feedback

Learning Successor Features the Simple Way

Neural Information Processing SystemsMar-20-2026, 18:43:09 GMT

In Deep Reinforcement Learning (RL), it is a challenge to learn representations that do not exhibit catastrophic forgetting or interference in non-stationary environments.

artificial intelligence, proceedings, reinforcement learning, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.77)

Add feedback

c63908a3e946af0e7978c23737229137-Paper-Conference.pdf

Neural Information Processing SystemsFeb-18-2026, 01:41:59 GMT

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Italy > Tuscany > Florence (0.04)
Asia > Singapore > Central Region > Singapore (0.04)
(6 more...)

Genre: Research Report > Experimental Study (1.00)

Industry:

Government (0.68)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(4 more...)

Add feedback

597254dc45be8c166d3ccf0ba2d56325-Paper-Conference.pdf

Neural Information Processing SystemsFeb-14-2026, 03:05:56 GMT

machine learning, natural language, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.14)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.92)

Industry:

Education > Educational Setting (0.45)
Health & Medicine > Therapeutic Area > Neurology (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.92)
(2 more...)

Add feedback

On the Representation Collapse of Sparse Mixture of Experts

Neural Information Processing SystemsDec-25-2025, 12:26:54 GMT

Sparse mixture of experts provides larger model capacity while requiring a constant computational overhead. It employs the routing mechanism to distribute input tokens to the best-matched experts according to their hidden representations. However, learning such a routing mechanism encourages token clustering around expert centroids, implying a trend toward representation collapse. In this work, we propose to estimate the routing scores between tokens and experts on a low-dimensional hypersphere. We conduct extensive experiments on cross-lingual language model pre-training and fine-tuning on downstream tasks. Experimental results across seven multilingual benchmarks show that our method achieves consistent gains. We also present a comprehensive analysis on the representation and routing behaviors of our models. Our method alleviates the representation collapse issue and achieves more consistent routing than the baseline mixture-of-experts methods.

name change, representation collapse, sparse mixture, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.41)

Add feedback

Combating Bilateral Edge Noise for Robust Link Prediction

Neural Information Processing SystemsDec-24-2025, 22:06:22 GMT

Although link prediction on graphs has achieved great success with the development of graph neural networks (GNNs), the potential robustness under the edge noise is still less investigated. To close this gap, we first conduct an empirical study to disclose that the edge noise bilaterally perturbs both input topology and target label, yielding severe performance degradation and representation collapse. To address this dilemma, we propose an information-theory-guided principle, Robust Graph Information Bottleneck (RGIB), to extract reliable supervision signals and avoid representation collapse. Different from the basic information bottleneck, RGIB further decouples and balances the mutual dependence among graph topology, target labels, and representation, building new learning objectives for robust representation against the bilateral noise. Two instantiations, RGIB-SSL and RGIB-REP, are explored to leverage the merits of different methodologies, i.e., self-supervised learning and data reparameterization, for implicit and explicit data denoising, respectively. Extensive experiments on six datasets and three GNNs with diverse noisy scenarios verify the effectiveness of our RGIB instantiations. The code is publicly available at: https://github.com/tmlr-group/RGIB.

combating bilateral edge noise, name change, robust link prediction, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.77)
Information Technology > Information Management (0.63)

Add feedback

TIP and Polish: Text-Image-Prototype Guided Multi-Modal Generation via Commonality-Discrepancy Modeling and Refinement

Ma, Zhiyong, Chen, Jiahao, Chuai, Qingyuan, Li, Zhengping

arXiv.org Artificial IntelligenceDec-1-2025

Multi-modal generation struggles to ensure thematic coherence and style consistency. Semantically, existing methods suffer from cross-modal mismatch and lack explicit modeling of commonality and discrepancy. Methods that rely on fine-grained training fail to balance semantic precision with writing style consistency. These shortcomings lead to suboptimal generation quality. To tackle these issues, we propose \textbf{\textit{TIPPo}}, a simple yet effective framework with explicit input modeling and comprehensive optimization objectives. It extracts the input text and images via multi-modal encoder and adapters, then measures the visual prototype. \textbf{T}extual, \textbf{I}mage, and \textbf{P}rototype signals are then fed to our proposed Dual Alignment Attention and Difference Operator modules before language model decoding. The proposed \textbf{Po}lishPPO reinforces the style consistency, while the unsupervised contrastive learning during SFT mitigates inter-sample representation collapse. Experimental results demonstrate the promising performance of \textbf{\textit{TIPPo}} in automatic evaluation and LLM-based criteria for creativity and semantic consistency.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2511.21698

Country: