AITopics

Country:

North America > United States > West Virginia (0.04)
North America > United States > Virginia (0.04)
North America > United States > California > Santa Cruz County > Santa Cruz (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry:

Media (0.46)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Neural Information Processing SystemsFeb-9-2026, 20:28:22 GMT

Large Language Models Are Latent Variable Models: Explaining and Finding Good Demonstrations for In-Context Learning

Can't wait to see the second movie! The first two lines are two demonstrations, and the third line is a test input.

demonstration, large language model, machine learning, (18 more...)

Country:

North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Puerto Rico (0.04)
(11 more...)

Industry:

Health & Medicine (0.93)
Media > Film (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (1.00)
(2 more...)

arXiv.org Artificial IntelligenceOct-28-2025

On the Convergence of Moral Self-Correction in Large Language Models

Liu, Guangliang, Mao, Haitao, Cao, Bochuan, Xue, Zhiyu, Zhang, Xitong, Wang, Rongrong, Johnson, Kristen Marie

Large Language Models (LLMs) are able to improve their responses when instructed to do so, a capability known as self-correction. When instructions provide only a general and abstract goal without specific details about potential issues in the response, LLMs must rely on their internal knowledge to improve response quality, a process referred to as intrinsic self-correction. The empirical success of intrinsic self-correction is evident in various applications, but how and why it is effective remains unknown. Focusing on moral self-correction in LLMs, we reveal a key characteristic of intrinsic self-correction: performance convergence through multi-round interactions; and provide a mechanistic analysis of this convergence behavior. Based on our experimental results and analysis, we uncover the underlying mechanism of convergence: consistently injected self-correction instructions activate moral concepts that reduce model uncertainty, leading to converged performance as the activated moral concepts stabilize over successive rounds. This paper demonstrates the strong potential of moral self-correction by showing that it exhibits a desirable property of converged performance.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

2510.0729

Country:

North America > United States (0.46)
Europe > Switzerland (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Neural Information Processing SystemsOct-9-2025, 22:48:02 GMT

32e07a110c6c6acf1afbf2bf82b614ad-Paper-Conference.pdf

arxiv preprint arxiv, intervention, misconception, (14 more...)

Country:

North America > United States > West Virginia (0.04)
North America > United States > Virginia (0.04)
North America > United States > California > Santa Cruz County > Santa Cruz (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry:

Media (0.46)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Neural Information Processing SystemsOct-8-2025, 10:11:01 GMT

3255a7554605a88800f4e120b3a929e1-Paper-Conference.pdf

concept token, demonstration, in-context learning, (12 more...)

Country:

North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Puerto Rico (0.04)
(11 more...)

Industry:

Health & Medicine (0.93)
Media > Film (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(3 more...)

Sharma, Arushi, Pungliya, Vedant, Quinn, Christopher J., Jannesari, Ali

Analyzing Latent Concepts in Code Language Models

arXiv.org Artificial IntelligenceOct-6-2025

Interpreting the internal behavior of large language models trained on code remains a critical challenge, particularly for applications demanding trust, transparency, and semantic robustness. We propose Code Concept Analysis (CoCoA): a global post-hoc interpretability framework that uncovers emergent lexical, syntactic, and semantic structures in a code language model's representation space by clustering contextualized token embeddings into human-interpretable concept groups. We propose a hybrid annotation pipeline that combines static analysis tool-based syntactic alignment with prompt-engineered large language models (LLMs), enabling scalable labeling of latent concepts across abstraction levels. We analyse the distribution of concepts across layers and across three finetuning tasks. Emergent concept clusters can help identify unexpected latent interactions and be used to identify trends and biases within the model's learned representations. We further integrate LCA with local attribution methods to produce concept-grounded explanations, improving the coherence and interpretability of token-level saliency. Empirical evaluations across multiple models and tasks show that LCA discovers concepts that remain stable under semantic-preserving perturbations (average Cluster Sensitivity Index, CSI = 0.288) and evolve predictably with fine-tuning. In a user study on the programming-language classification task, concept-augmented explanations disambiguated token roles and improved human-centric explainability by 37 percentage points compared with token-level attributions using Integrated Gradients.

large language model, machine learning, natural language, (18 more...)

2510.00476

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

arXiv.org Machine LearningOct-2-2025

Nonparametric Identification of Latent Concepts

Zheng, Yujia, Xie, Shaoan, Zhang, Kun

We are born with the ability to learn concepts by comparing diverse observations. This helps us to understand the new world in a compositional manner and facilitates extrapolation, as objects naturally consist of multiple concepts. In this work, we argue that the cognitive mechanism of comparison, fundamental to human learning, is also vital for machines to recover true concepts underlying the data. This offers correctness guarantees for the field of concept learning, which, despite its impressive empirical successes, still lacks general theoretical support. Specifically, we aim to develop a theoretical framework for the identifiability of concepts with multiple classes of observations. We show that with sufficient diversity across classes, hidden concepts can be identified without assuming specific concept types, functional relations, or parametric generative models. Interestingly, even when conditions are not globally satisfied, we can still provide alternative guarantees for as many concepts as possible based on local comparisons, thereby extending the applicability of our theory to more flexible scenarios. Moreover, the hidden structure between classes and concepts can also be identified nonparametrically. We validate our theoretical results in both synthetic and real-world settings.

assumption, identifiability, nonparametric identification, (13 more...)

arXiv.org Machine Learning

2510.00136

Country:

Asia > Japan > Honshū > Tōhoku > Iwate Prefecture > Morioka (0.04)
North America > United States > California (0.04)
North America > Canada (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
(2 more...)

Hong, Guan Zhe, Vasudeva, Bhavya, Sharan, Vatsal, Rashtchian, Cyrus, Raghavan, Prabhakar, Panigrahy, Rina

Latent Concept Disentanglement in Transformer-based Language Models

arXiv.org Artificial IntelligenceSep-29-2025

When large language models (LLMs) use in-context learning (ICL) to solve a new task, they must infer latent concepts from demonstration examples. This raises the question of whether and how transformers represent latent structures as part of their computation. Our work experiments with several controlled tasks, studying this question using mechanistic interpretability. First, we show that in transitive reasoning tasks with a latent, discrete concept, the model successfully identifies the latent concept and does step-by-step concept composition. This builds upon prior work that analyzes single-step reasoning. Then, we consider tasks parameterized by a latent numerical concept. We discover low-dimensional subspaces in the model's representation space, where the geometry cleanly reflects the underlying parameterization. Overall, we show that small and large models can indeed disentangle and utilize latent concepts that they learn in-context from a handful of abbreviated demonstrations.

large language model, machine learning, natural language, (20 more...)

2506.16975

Country:

Asia (1.00)
Europe (0.68)
North America > United States > California (0.28)

Genre: Research Report > New Finding (0.92)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.82)

Park, Seongwan, Kim, Taeklim, Ko, Youngjoong

Decoding Dense Embeddings: Sparse Autoencoders for Interpreting and Discretizing Dense Retrieval

arXiv.org Artificial IntelligenceAug-28-2025

Despite their strong performance, Dense Passage Retrieval (DPR) models suffer from a lack of interpretability. In this work, we propose a novel interpretability framework that leverages Sparse Autoencoders (SAEs) to decompose previously uninterpretable dense embeddings from DPR models into distinct, interpretable latent concepts. We generate natural language descriptions for each latent concept, enabling human interpretations of both the dense embeddings and the query-document similarity scores of DPR models. We further introduce Concept-Level Sparse Retrieval (CL-SR), a retrieval framework that directly utilizes the extracted latent concepts as indexing units. CL-SR effectively combines the semantic expressiveness of dense embeddings with the transparency and efficiency of sparse representations. We show that CL-SR achieves high index-space and computational efficiency while maintaining robust performance across vocabulary and semantic mismatches.

latent concept, machine learning, natural language, (18 more...)

2506.00041

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Neural Information Processing SystemsAug-17-2025, 02:03:08 GMT

Addressing Leakage in Concept Bottleneck Models

Concept bottleneck models (CBMs) (Chen et al., 2020; Koh et al., 2020; Y eh et al., 2020; Lage & Doshi-V elez, 2020; Wang et al., 2017) propose explicitly aligning the intermediate layers of a The concept bottleneck model (CBM) makes its prediction in two stages. While soft concepts may improve predictive performance, this improvement comes at a cost.

artificial intelligence, machine learning, predictor, (17 more...)

Country: North America (0.14)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)