Multilevel Interpretability Of Artificial Neural Networks: Leveraging Framework And Methods From Neuroscience
He, Zhonghao, Achterberg, Jascha, Collins, Katie, Nejad, Kevin, Akarca, Danyal, Yang, Yinzhu, Gurnee, Wes, Sucholutsky, Ilia, Tang, Yuhan, Ianov, Rebeca, Ogden, George, Li, Chole, Sandbrink, Kai, Casper, Stephen, Ivanova, Anna, Lindsay, Grace W.
–arXiv.org Artificial Intelligence
Interpretability research aims to provide a human-understandable explanation for model outputs and behaviors based on the input and model's internal structure [Doshi-Velez and Kim, 2017]. The field's goal is to generate mechanistic explanations of how neural networks perform computations and produce behaviors [Nanda et al., 2023, Olsson et al., 2022], which could help predict the behavior of such networks across a wide range of scenarios and possibly solve notable problems of AI systems, such as hallucination and toxic output [Ji et al., 2023]. Being able to interpret AI systems is therefore a key capability to be able to understand whether models are appropriately fair, reliable, robust, and worthy of user trust [Doshi-Velez and Kim, 2017]. However, understanding the computations of frontier AI systems with hundreds of billions of parameters presents many technical challenges, from the curse of dimensionality [Zhao et al., 2024, Altman and Krzywinski, 2018] to finding a suitable unit of analysis [Olah et al., 2020, Zou et al., 2023]. These challenges are par for the course when studying complex systems. In particular, many challenges around artificial neural networks (ANN) interpretability are intimately familiar to another group of researchers: neuroscientists. Neuroscience (often in partnership with cognitive science and psychology) investigates how neurons, their connections, and their activity patterns give rise to cognition and behavior. Similar to how deep learning researchers have recognized, neuroscientists have realized that simply examining activity profiles of individual neurons in response to a particular input is often insufficient for understanding how the system performs computation. Instead, complex neural systems are best understood across multiple levels of analysis - considering behavior alongside the brain's connectome, population codes, and codes of single neurons to gain a holistic understanding of the inner workings of the brain
arXiv.org Artificial Intelligence
Aug-25-2024
- Country:
- Pacific Ocean > North Pacific Ocean
- San Francisco Bay > Golden Gate (0.04)
- North America > United States
- New York (0.04)
- Washington > King County
- Seattle (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- Europe
- United Kingdom > England
- Cambridgeshire > Cambridge (0.14)
- Oxfordshire > Oxford (0.14)
- Ukraine > Kyiv Oblast
- Kyiv (0.04)
- Switzerland > Zürich
- Zürich (0.14)
- Latvia > Lubāna Municipality
- Lubāna (0.04)
- Italy > Tuscany
- Florence (0.04)
- Germany > North Rhine-Westphalia
- Cologne Region > Cologne (0.04)
- United Kingdom > England
- Asia
- Middle East > Jordan (0.04)
- China > Hong Kong (0.04)
- Pacific Ocean > North Pacific Ocean
- Genre:
- Research Report > New Finding (0.46)
- Instructional Material > Course Syllabus & Notes (0.45)
- Industry:
- Health & Medicine > Therapeutic Area > Neurology (1.00)