AITopics | inversionview

InversionView: A General-Purpose Method for Reading Information from Neural Activations

Neural Information Processing SystemsMar-22-2026, 21:47:29 GMT

The inner workings of neural networks can be better understood if we can fully decipher the information encoded in neural activations. In this paper, we argue that this information is embodied by the subset of inputs that give rise to similar activations. We propose InversionView, which allows us to practically inspect this subset by sampling from a trained decoder model conditioned on activations. This helps uncover the information content of activation vectors, and facilitates understanding of the algorithms implemented by transformer models. We present four case studies where we investigate models ranging from small transformers to GPT-2. In these studies, we show that InversionView can reveal clear information contained in activations, including basic information about tokens appearing in the context, as well as more complex information, such as the count of certain tokens, their relative positions, and abstract knowledge about the subject. We also provide causally verified circuits to confirm the decoded information.

artificial intelligence, information, machine learning, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.61)

Add feedback

f94cfd15db3f16ee7789b6b7e91ec476-Paper-Conference.pdf

Neural Information Processing SystemsFeb-18-2026, 18:50:41 GMT

information, large language model, machine learning, (20 more...)

Neural Information Processing Systems

Country:

Europe > Germany > Saarland (0.04)
Asia > Philippines (0.04)
North America > Canada > Ontario > Toronto (0.04)
(8 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.92)

Industry:

Leisure & Entertainment (0.45)
Information Technology (0.45)
Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)

Add feedback

InversionView: A General-Purpose Method for Reading Information from Neural Activations

Neural Information Processing SystemsOct-10-2025, 21:56:51 GMT

The inner workings of neural networks can be better understood if we can fully decipher the information encoded in neural activations. In this paper, we argue that this information is embodied by the subset of inputs that give rise to similar activations.

activation, digit, information, (16 more...)

Neural Information Processing Systems

Country:

Europe > Germany > Saarland (0.04)
Asia > Philippines (0.04)
North America > Canada > Ontario > Toronto (0.04)
(8 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.92)

Industry:

Leisure & Entertainment (0.45)
Information Technology (0.45)
Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

InversionView: A General-Purpose Method for Reading Information from Neural Activations

Neural Information Processing SystemsMay-27-2025, 21:42:23 GMT

The inner workings of neural networks can be better understood if we can fully decipher the information encoded in neural activations. In this paper, we argue that this information is embodied by the subset of inputs that give rise to similar activations. We propose InversionView, which allows us to practically inspect this subset by sampling from a trained decoder model conditioned on activations. This helps uncover the information content of activation vectors, and facilitates understanding of the algorithms implemented by transformer models. We present four case studies where we investigate models ranging from small transformers to GPT-2.

general-purpose method, information, inversionview, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.65)

Add feedback

InversionView: A General-Purpose Method for Reading Information from Neural Activations

Huang, Xinting, Panwar, Madhur, Goyal, Navin, Hahn, Michael

arXiv.org Artificial IntelligenceJul-15-2024

The inner workings of neural networks can be better understood if we can fully decipher the information encoded in neural activations. In this paper, we argue that this information is embodied by the subset of inputs that give rise to similar activations. Computing such subsets is nontrivial as the input space is exponentially large. We propose InversionView, which allows us to practically inspect this subset by sampling from a trained decoder model conditioned on activations. This helps uncover the information content of activation vectors, and facilitates understanding of the algorithms implemented by transformer models. We present four case studies where we investigate models ranging from small transformers to GPT-2. In these studies, we demonstrate the characteristics of our method, show the distinctive advantages it offers, and provide causally verified circuits.

activation, digit, information, (17 more...)

arXiv.org Artificial Intelligence

2405.17653

Country:

Asia > Philippines (0.04)
North America > Canada > Ontario > Toronto (0.04)
Europe > Germany > Saarland (0.04)
(8 more...)

Genre: Research Report > New Finding (1.00)

Industry: Leisure & Entertainment (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Filters

Collaborating Authors

inversionview

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

InversionView: A General-Purpose Method for Reading Information from Neural Activations

f94cfd15db3f16ee7789b6b7e91ec476-Paper-Conference.pdf

InversionView: A General-Purpose Method for Reading Information from Neural Activations

InversionView: A General-Purpose Method for Reading Information from Neural Activations

InversionView: A General-Purpose Method for Reading Information from Neural Activations