FIND: A Function Description Benchmark for Evaluating Interpretability Methods Sarah Schwettmann

Nov-20-2025, 00:18:25 GMT–Neural Information Processing Systems

The central task of interpretability research is to explain the functions that AI systems learn from data. Investigating these functions requires experimentation with trained models, using tools that incorporate varying degrees of human input. Hand-tooled approaches that rely on close manual inspection [Zeiler and Fergus, 2014, Zhou et al., 2014, Mahendran and V edaldi, 2015, Olah et al., 2017, 2020, Elhage et al., 2021] or search for predefined phenomena [Wang et al., 2022, Nanda

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Nov-20-2025, 00:18:25 GMT

Conferences PDF

Add feedback

Country:
- Africa (0.04)
- Oceania > Australia (0.04)
- South America
  - Peru (0.04)
  - Argentina (0.04)
- North America
  - Mexico (0.04)
  - United States
    - Texas (0.04)
    - Louisiana (0.04)
    - Massachusetts > Middlesex County
      - Cambridge (0.04)
    - Illinois > Cook County
      - Chicago (0.04)
    - California > Los Angeles County
      - Los Angeles (0.04)
- Europe
  - Germany (0.04)
  - Switzerland (0.04)
  - Ireland (0.04)
  - France (0.04)
- Asia > Japan
  - Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)

Genre:
- Research Report (1.00)

Industry:
- Media (1.00)
- Transportation (0.94)
- Government (0.93)
- Leisure & Entertainment > Sports (0.92)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Vision (0.93)
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (0.97)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
ef0164c1112f56246224af540857348f-Paper-Datasets_and_Benchmarks.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found