Multilevel Interpretability Of Artificial Neural Networks: Leveraging Framework And Methods From Neuroscience

He, Zhonghao, Achterberg, Jascha, Collins, Katie, Nejad, Kevin, Akarca, Danyal, Yang, Yinzhu, Gurnee, Wes, Sucholutsky, Ilia, Tang, Yuhan, Ianov, Rebeca, Ogden, George, Li, Chole, Sandbrink, Kai, Casper, Stephen, Ivanova, Anna, Lindsay, Grace W.

arXiv.org Artificial Intelligence 

Interpretability research aims to provide a human-understandable explanation for model outputs and behaviors based on the input and model's internal structure [Doshi-Velez and Kim, 2017]. The field's goal is to generate mechanistic explanations of how neural networks perform computations and produce behaviors [Nanda et al., 2023, Olsson et al., 2022], which could help predict the behavior of such networks across a wide range of scenarios and possibly solve notable problems of AI systems, such as hallucination and toxic output [Ji et al., 2023]. Being able to interpret AI systems is therefore a key capability to be able to understand whether models are appropriately fair, reliable, robust, and worthy of user trust [Doshi-Velez and Kim, 2017]. However, understanding the computations of frontier AI systems with hundreds of billions of parameters presents many technical challenges, from the curse of dimensionality [Zhao et al., 2024, Altman and Krzywinski, 2018] to finding a suitable unit of analysis [Olah et al., 2020, Zou et al., 2023]. These challenges are par for the course when studying complex systems. In particular, many challenges around artificial neural networks (ANN) interpretability are intimately familiar to another group of researchers: neuroscientists. Neuroscience (often in partnership with cognitive science and psychology) investigates how neurons, their connections, and their activity patterns give rise to cognition and behavior. Similar to how deep learning researchers have recognized, neuroscientists have realized that simply examining activity profiles of individual neurons in response to a particular input is often insufficient for understanding how the system performs computation. Instead, complex neural systems are best understood across multiple levels of analysis - considering behavior alongside the brain's connectome, population codes, and codes of single neurons to gain a holistic understanding of the inner workings of the brain

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found