Multilevel Interpretability Of Artificial Neural Networks: Leveraging Framework And Methods From Neuroscience

He, Zhonghao, Achterberg, Jascha, Collins, Katie, Nejad, Kevin, Akarca, Danyal, Yang, Yinzhu, Gurnee, Wes, Sucholutsky, Ilia, Tang, Yuhan, Ianov, Rebeca, Ogden, George, Li, Chole, Sandbrink, Kai, Casper, Stephen, Ivanova, Anna, Lindsay, Grace W.

Aug-25-2024–arXiv.org Artificial Intelligence

Interpretability research aims to provide a human-understandable explanation for model outputs and behaviors based on the input and model's internal structure [Doshi-Velez and Kim, 2017]. The field's goal is to generate mechanistic explanations of how neural networks perform computations and produce behaviors [Nanda et al., 2023, Olsson et al., 2022], which could help predict the behavior of such networks across a wide range of scenarios and possibly solve notable problems of AI systems, such as hallucination and toxic output [Ji et al., 2023]. Being able to interpret AI systems is therefore a key capability to be able to understand whether models are appropriately fair, reliable, robust, and worthy of user trust [Doshi-Velez and Kim, 2017]. However, understanding the computations of frontier AI systems with hundreds of billions of parameters presents many technical challenges, from the curse of dimensionality [Zhao et al., 2024, Altman and Krzywinski, 2018] to finding a suitable unit of analysis [Olah et al., 2020, Zou et al., 2023]. These challenges are par for the course when studying complex systems. In particular, many challenges around artificial neural networks (ANN) interpretability are intimately familiar to another group of researchers: neuroscientists. Neuroscience (often in partnership with cognitive science and psychology) investigates how neurons, their connections, and their activity patterns give rise to cognition and behavior. Similar to how deep learning researchers have recognized, neuroscientists have realized that simply examining activity profiles of individual neurons in response to a particular input is often insufficient for understanding how the system performs computation. Instead, complex neural systems are best understood across multiple levels of analysis - considering behavior alongside the brain's connectome, population codes, and codes of single neurons to gain a holistic understanding of the inner workings of the brain

arxiv preprint arxiv, neuroscience, representation, (11 more...)

arXiv.org Artificial Intelligence

Aug-25-2024

arXiv.org PDF

Add feedback

Country:
- Pacific Ocean > North Pacific Ocean
  - San Francisco Bay > Golden Gate (0.04)
- North America > United States
  - New York (0.04)
  - Washington > King County
    - Seattle (0.04)
  - Massachusetts > Middlesex County
    - Cambridge (0.04)
- Europe
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.14)
    - Oxfordshire > Oxford (0.14)
  - Ukraine > Kyiv Oblast
    - Kyiv (0.04)
  - Switzerland > Zürich
    - Zürich (0.14)
  - Latvia > Lubāna Municipality
    - Lubāna (0.04)
  - Italy > Tuscany
    - Florence (0.04)
  - Germany > North Rhine-Westphalia
    - Cologne Region > Cologne (0.04)
- Asia
  - Middle East > Jordan (0.04)
  - China > Hong Kong (0.04)

Genre:
- Research Report > New Finding (0.46)
- Instructional Material > Course Syllabus & Notes (0.45)

Industry:
- Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Cognitive Science (1.00)
  - Representation & Reasoning > Uncertainty
    - Bayesian Inference (1.00)
  - Machine Learning
    - Neural Networks > Deep Learning (1.00)
    - Learning Graphical Models > Directed Networks
      - Bayesian Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found