FIND: A Function Description Benchmark for Evaluating Interpretability Methods Sarah Schwettmann
–Neural Information Processing Systems
The central task of interpretability research is to explain the functions that AI systems learn from data. Investigating these functions requires experimentation with trained models, using tools that incorporate varying degrees of human input. Hand-tooled approaches that rely on close manual inspection [Zeiler and Fergus, 2014, Zhou et al., 2014, Mahendran and V edaldi, 2015, Olah et al., 2017, 2020, Elhage et al., 2021] or search for predefined phenomena [Wang et al., 2022, Nanda
Neural Information Processing Systems
Nov-20-2025, 00:18:25 GMT
- Country:
- Africa (0.04)
- Asia > Japan
- Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)
- Europe
- France (0.04)
- Germany (0.04)
- Ireland (0.04)
- Switzerland (0.04)
- North America
- Mexico (0.04)
- United States
- California > Los Angeles County
- Los Angeles (0.04)
- Illinois > Cook County
- Chicago (0.04)
- Louisiana (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- Texas (0.04)
- California > Los Angeles County
- Oceania > Australia (0.04)
- South America
- Genre:
- Research Report (1.00)
- Industry:
- Government (0.93)
- Leisure & Entertainment > Sports (0.92)
- Media (1.00)
- Transportation (0.94)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning > Neural Networks
- Deep Learning (1.00)
- Natural Language
- Chatbot (0.97)
- Large Language Model (1.00)
- Representation & Reasoning (1.00)
- Vision (0.93)
- Machine Learning > Neural Networks
- Information Technology > Artificial Intelligence