Optimal ablation for interpretability
–Neural Information Processing Systems
Interpretability work in machine learning (ML) seeks to develop tools that make models more intelligible to humans in order to better monitor model behavior and predict failure modes. Early work in interpretability sought to identify relationships between model outputs and input features (Ribeiro et al., 2016; Covert et al., 2022), but with only black-box query access to observe inputs and outputs, it can be difficult to evaluate a model's internal logic.
Neural Information Processing Systems
Oct-10-2025, 16:05:33 GMT
- Country:
- Asia > Middle East > Jordan (0.04)
- Genre:
- Research Report
- Experimental Study (0.92)
- New Finding (0.67)
- Research Report
- Industry:
- Information Technology (0.67)
- Technology:
- Information Technology
- Data Science (1.00)
- Information Management (0.92)
- Artificial Intelligence
- Representation & Reasoning > Optimization (1.00)
- Natural Language (1.00)
- Vision (0.92)
- Machine Learning
- Statistical Learning (1.00)
- Neural Networks > Deep Learning (0.46)
- Information Technology