Understanding Task Representations in Neural Networks via Bayesian Ablation

Nam, Andrew, Campbell, Declan, Griffiths, Thomas, Cohen, Jonathan, Leslie, Sarah-Jane

May-21-2025–arXiv.org Artificial Intelligence

Neural networks are powerful tools for cognitive modeling due to their flexibility and emergent properties. However, interpreting their learned representations remains challenging due to their sub-symbolic semantics. In this work, we introduce a novel probabilistic framework for interpreting latent task representations in neural networks. Inspired by Bayesian inference, our approach defines a distribution over representational units to infer their causal contributions to task performance. Using ideas from information theory, we propose a suite of tools and metrics to illuminate key model properties, including representational distributedness, manifold complexity, and polysemanticity.

artificial intelligence, machine learning, representation, (19 more...)

arXiv.org Artificial Intelligence

May-21-2025

arXiv.org PDF

Add feedback

Country:
- Africa (0.04)
- Asia > Middle East
  - Republic of Türkiye (0.04)
- Europe > Belgium
  - Flanders > Flemish Brabant > Leuven (0.04)
- North America > United States
  - Massachusetts > Middlesex County > Cambridge (0.04)

Genre:
- Research Report (0.82)

Industry:
- Health & Medicine > Therapeutic Area > Neurology (0.50)

Technology:
- Information Technology > Artificial Intelligence
  - Cognitive Science (1.00)
  - Machine Learning
    - Learning Graphical Models > Directed Networks
      - Bayesian Learning (0.46)
    - Neural Networks (1.00)
  - Representation & Reasoning > Uncertainty
    - Bayesian Inference (0.88)