Optimal ablation for interpretability

Neural Information Processing Systems 

Interpretability studies often involve tracing the flow of information through machine learning models to identify specific model components that perform relevant computations for tasks of interest.