Optimal ablation for interpretability
–Neural Information Processing Systems
Interpretability work in machine learning (ML) seeks to develop tools that make models more intelligible to humans in order to better monitor model behavior and predict failure modes. Early work in interpretability sought to identify relationships between model outputs and input features (Ribeiro et al., 2016; Covert et al., 2022), but with only black-box query access to observe inputs and outputs, it can be difficult to evaluate a model's internal logic.
Neural Information Processing Systems
Oct-10-2025, 16:05:33 GMT
- Country:
- Asia > Middle East > Jordan (0.04)
- Genre:
- Research Report
- Experimental Study (0.92)
- New Finding (0.67)
- Research Report
- Industry:
- Information Technology (0.67)
- Technology:
- Information Technology
- Artificial Intelligence
- Machine Learning
- Neural Networks > Deep Learning (0.46)
- Statistical Learning (1.00)
- Natural Language (1.00)
- Representation & Reasoning > Optimization (1.00)
- Vision (0.92)
- Machine Learning
- Data Science (1.00)
- Information Management (0.92)
- Artificial Intelligence
- Information Technology